Data Bases: Concepts and Types Topics to be Covered • Organizing Information • DBMSs in Organizations • Traditional DBMS (data base management systems) – flat files (in reality, no DBMS) – navigational data bases • Hierarchical & Network – relational data base (RDBM) – The move to Object-Oriented concepts • Geographic or Spatial Data Base Models – – – – Tomlinson discusses these concepts in Chapter 9. CAD Coverages/Georelational Data Model Shapefiles Geodatabase Data Model • Appendix: Object Oriented Implementations Next time, we will cover database design details. 3/23/2016 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 1 Organizing Information: Tables Parcel # 8 9 36 75 Parcel Table Address Block 501 N Hi 1 590 N Hi 2 1001 W. Main 4 1175 W. 1st 12 $ Value 105,450 89,780 101,500 98,000 Information generally stored in Tables in which: • rows: records, observations, features (ArcInfo and ArcGIS), concepts or entities – ‘all’ information about one occurrence of a feature • columns: fields, data element, variables, items (ArcInfo), properties or attributes – contain info about a specific characteristic for all the features Data Base Management Systems (DBMS) are software systems for managing these • DBMSs differ according to number of tables and how they are associated with each other 3/23/2016 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 2 Organizing Information : Entities & Attributes name jane joan jim jean address 201 N. Hi 207 N Main 20 Elm 40 Oak dob 45 55 75 80 ssn 274-54-8910 234-81-7890 890-75-9876 x04-23-7890 Entity (in row) •Each column has a unique name •Each row is unique and has an ID •Sequence of columns is insignificant •Sequence of rows is insignificant •Entries in columns are single-valued •Entries in columns are of the same kind Attribute (in column) • Entity --person, place, thing or event about which information is maintained • student/citizen/customer, street segment, polygon • entities are stored in records (physical location) • attribute--a piece of information describing an entity • name, ssn, dob, gpa---length, direction, street number---population, median home value • attributes are stored in fields (physical location) • called a data element (logical reference) in a DBMS • Primary key (or key field)--an attribute which uniquely identifies each instance of an entity • preferably numeric, such as ssn (but that’s a bad choice!) • Foreign (secondary) key—an attribute or set of attributes which identifies the entity with which another entity is associated • May or may not be the same as the key field DBMSs differ in how these associations are formed Organizing Information: Classes Object Classes a set of non-spatial entities with similar characteristics e.g. owners of property Feature Classes a set of spatial entities with similar characteristics e.g. property parcels Classes are represented in Tables which are physically stored in the computer system as files Users/applications may access this information directly thru the operating system’s native file structure or indirectly via an intervening software layer called a DBMS (database management system) 3/23/2016 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 4 Information Organization: Physical Database Tables stored in Files Entities stored in Records Attributes stored in Fields bytes bits collection of related files: all property records collection of relate records: residence owners in Plano collection of related items: your name, address,tax bill set of characters forming one item--e.g. your name Databases within Organizations 3/23/2016 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 6 Organizations without DBMS User Department Computer Application Data File (stored as OS file) Payroll Application Accounting Purchasing City School 3/23/2016 Employee File Purchasing Application Vendor File Tax Collection Tax Bills Land Parcel File Utilities Water Bills Utility Hook-up File Student Affairs Student Billing Student File Development Solicitation Request Donor File Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 7 Integrated Data Base person entity id address name ssn courses taken job title property entity id location owner utility connect corporate entity id address name activity Organizations with DBMS Payroll Application Data Base Management System Purchasing Application Payroll Dept. Purchasing Dept. Billing Application Ideally, all data in a DBMS is: •stored once •defined consistently •used by all programs In practice, this is something of a pipedream! xxxx Dept. Why Data Base Management Systems (DBMS)? • with OS files, application programmer must define data needed and specify its characteristics and location User/Programmer Logical and Physical – files often tied to one application/user • DBMS provides an interface between application program and physical data files stored by the OS. • with DBMS, user/programmer defines only data needed, the DBMS tracks physical location and characteristics. • DBMS presents logical view of data to user/ programmer, while maintaining internally a physical view of where and how data is actually stored. – files more easily shared between applications/users • Ideally, all data in a DBMS is: • stored once • defined consistently • used by all programs OS File Structure User/Programmer Logical Data element DBMS Physical field OS File Structure (In practice, this is something of a pipedream!) DBMS Components: • data dictionary: inventory of data elements; defines and stores their characteristics: • • • • physical characteristics (size, type) location ownership and security usage (last date, business organ, programs, reports, etc.) • data definition or administration language: language used by data base administrator to specify the content and structure (the schema) of the data base – Originally this was unique to each vendor, and still is to a degree – UML (Unified Modeling Language) now provides a standardized, visualbased approach for creating schemas • data manipulation language: commands permitting end-users and/or programmers to extract and transform the data • structured query language (SQL) is the standard • Applications often contain point & click interfaces which generate SQL queries 3/23/2016 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 10 Traditional DBMS’s (data base management systems) 3/23/2016 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 11 Flat File Systems: no DBMS • A few, large ‘rectangular’ tables, each contained in an operating system (OS) file – one (often very large) file contains all data for a particular application – each normally associated with a particular group of users. • one field (or a combination) in each table designated as key field – unique identifier for each record – used to sort file – records identified by key value can be found quickly • processing focused on search: several alternative methods: – sequential search: from first record on down – binary search: starts at middle; successively eliminates ‘halves’ with greater than/less than tests – indexed search: search (usually binary) via a second (index) file containing only key field and address of record in the main file. – smaller file to search, therefore faster • only need to re-sort index file, not big main file, when records added • can have several index files for different keys – (note: index files used in other contexts (e.g. geocoding in ArcView) to speed processing) • problems with flat file structures – – – – – data items often blank over multiple records, so wastes computer resources (but disks cheap) data items often repeated over multiple records, so wasteful and possible inconsistency access to records by any other field than key field is slow adding fields requires major re-programming adding new records requires additional processing of indexes Flat File Example Property File Parcel NumberParcel Addddress Block 8 501 Sadowski 1 9 590 Sadowski 2 36 1001 Adnan 4 75 1175 Dadlexz 12 District a b b e Tract Owner #1 Name Owner #1 Address Owner # 2 Name Owner #2 Address 101 Sadowski. M.G. 501 Sadowski 1 Adams 590 Sadowski Adams, M 590 Sadowski 105 Sadowski, M.G. 501 Sadowski 202 Kroeger 592 Tierney Bertrand. K 1097 Bertrand Value 105450 89780 101500 98000 Key field One record for each parcel. All information must be recorded in the file. Access to the data in a record is accomplished by searching through all records for a value in one field, called a key field 3/23/2016 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 13 Data Base Management Systems (DBMS) Navigational File Systems (hierarchical & network) eg IBM’s IMS Characteristics • multiple files, each with different record structure (ie different fields) • a record designated as master or parent record • each parent record can have multiple child records associated with it • in turn, child records can themselves be parents and have children • pointers track the parent-child links • Hierarchical: child has only one parent (one-to-many from parent perspective) • Network: child can be related to multiple parents (many-to-many relationship) Problems • can only access a record via a parent – must navigate through the hierarchy/network structure • pointer structure can become very complex (espec. for network) – take up more space than data – difficult for systems staff to manage – incomprehensible to users No longer used except for “legacy” applications (applications not yet converted to current technology). 3/23/2016 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 14 Hierarchical DBMS: City Dallas Mesquite Fresno Garland District A B C Block 1 2 4 12 To produce a map of values by district would involve tracing down from the district table through the block table to the parcel tables associated with each block. Parcel # 8 10 6 3 Parcel Table Address 501 N Hi 590 N Oak 1001 N Hi 75 W. 3st $ Value 105,450 89,780 101,500 98,000 Parcel # 9 63 Parcel Table Address 590 N Hi 15 N Ash $ Value 89,780 101,500 Data Base Management Systems (DBMS) Relational Systems Characteristics • again, multiple tables (‘files’), each with different record structure • tables can be related if a common record identifier or secondary (foreign) key (column) present in both tables – relations are created on the fly without need to maintain pointers. – relate: temporary connection between two tables Not adhered to by ESRI in ArcGIS – join: permanent merger of two tables into one Problems • high computational requirements if many joins needed • tables and ‘entity relationships’ need to be carefully planned for efficiency • assumes data amenable to record/field, observation/variable representation. What about graphics? Examples • ESRI’s INFO (UNIX platform originally, now also MS NT/XP) • IBM’s DB2 (mainframe and others) • Oracle, Ingress, Sybase, Informix (Unix originally, now also MS NT/2000/XP) • SQL Server, Access (Windows NT/200/XP) 16 3/23/2016 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation Relational DBMS: Parcel # 8 9 36 75 Parcel Table Address Block 501 N Hi 1 590 N Hi 2 1001 W. Main 4 1175 W. 1st 12 $ Value 105,450 89,780 101,500 98,000 Goal: produce map of values by district/ neighborhood Problem: no district code available in Parcel Table (primary key field) (secondary or foreign key field) Solution: join Parcel Table, containing values, with Geograpahy Table, containing location codings, using Block as key field 3/23/2016 Ron Briggs, UT-Dallas Geography Table Block District Tract 1 A 101 2 B 101 4 B 105 12 E 202 (primary key field) City Dallas Dallas Dallas Garland GISC 6383 GIS Management and Implementation 17 The Move to Object-Oriented Architectures 3/23/2016 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 18 The Move to Object-Oriented Architectures OODBMS Approach – object: data encapsulated by code, thus can be “data” or “software code” – tabular, record oriented – a “thing” but not necessarily physical – can’t easily handle ‘globs’ : maps, circuit diagrams, (e.g. power pole, sewer network or engineering drawings hash-mark table, zoom capability) – data and code are – objects identified by LOID (logical functionally separate object ID) • Distribution model – check out object to work on it, perhaps – focus on server; by multiple people (concept of – limited client activity; no ‘versioning’: see Zeiler Chap. 7) local data base • Transaction model – reuse objects (software or data) in new – short term, immediate: programs insert, update, delete – complements rather than replaces – no long term check out (for RDBMS design, salesman on road , server unavailable, etc.) – Data objects stored as complex rows in tables Classic RDBMS problems: • Data model Object Concepts: objects Classic Procedural Programming – step by step instructions – data and operations separate – One programmers code often “crashed’ when used with another’s OO Programming – operations and data grouped into objects – objects responsible to carry out their own operations – objects re-useable – User needs to know only how to communicate with the object, not how it works inside. 3/23/2016 Ron Briggs, UT-Dallas All Objects (e.g. a software button, or a water valve) have – properties • characteristics of the object (stored in instance variables) • e.g. button/valve size, color, – Events or interfaces • actions an object recognizes • Invokes its behaviors • e.g. being clicked or turned on/off – methods (or behaviors) • actions associated with or carried out by object • open browser, allow water flow 20 GISC 6383 GIS Management and Implementation Object Concepts: classes Examples: • You are a human object. You are • All objects are part of higherorder object classes: they inherit what they know and can do from • their ancestor object class. • All objects in a class share certain characteristics and capabilities, but each also has some unique characteristic(s) or • capabilities • When an event happens to an object (e.g. it is mouse-clicked) it reacts in a certain way • according to its data and code. Classes are sometimes viewed as blueprints or templates for objects Data + class = object 3/23/2016 Ron Briggs, UT-Dallas – different from cats and fish (other animate object classes) – share many characteristics with other humans (same object class)--two feet, spine, etc. – have certain characteristics that make you, you as a class of human, if smoke gets in your eyes (an event happens), your eyes tear/blink (behavior) – nobody taught you this; you inherited it because of your human ancestor class “land property” higher-order object class (super class) : modify certain characteristics to create residential, commercial, agricultural, etc. lowerorder property classes, and again single, multi, group lower-order residential classes As a residential property object, when October 1st occurs (event), you issue tax bill – You inherit this behavior because of your land property ancestor class GISC 6383 GIS Management and Implementation 21 Spatial (Geographic) Data Models Examine now how spatial data models have evolved from traditional to object oriented. 3/23/2016 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 22 Computer Aided Design ( CAD) • CAD Data Model (1960s & 1970s) – Maps created with general purpose computer aided design (CAD) software – The primary approach in the 1960 and 1970s Problems: – Essentially only a graphic: no topological relationships--No knowledge that two lines (roads) intersect – Different features may be combined in same layer or feature class: roads and railroads in same line layer or feature class. – Limited attribute information and no data base: generally just map layer names and annotation labels • CAD Model Today (2000s) – Still the basis for engineering drawings • AutoCAD and Intergraph are the industry leaders – ESRI’s attempt to develop its own competing CAD product (ArcCAD) now abandoned – ESRI’s strategy today is to handle CAD drawings within GIS software as “just another data type” – ArcGIS 9 has powerful capabilities for reading CAD files. “Georelational” • Coverage Data Model (1980s and 1990s) – Introduced with first commercial GIS package ArcInfo in 1981 – Spatial data stored in indexed binary files for performance reasons – Full topological relationship information maintained: e.g. nodes that delimit a line • Permitted sophisticated spatial analysis – Attribute data about features (entities) stored in data base tables using proprietary INFO relational data base system • Allowed user to customize, organize and store substantial amounts of attribute data and relate to spatial data Problems: – Complex structure: multiple files in folders within a Workspace directory – Non-industry standard data base (INFO) • User must maintain two data bases: INFO and industry standard (e.g ORACLE) • corporate data often must be duplicated in INFO attribute tables • General corporate users isolated from spatial data – Features (entities) represented as generic points, lines, polygons • Behavior of a “stream” line same as a “road” line • Complex AML (Arc Macro Language) programming required to represent unique behavior • Coverage Model Today (2000s) – now (as off ArcInfo Version 8 in 2000) called the Georelational model – Has been replaced by the Geodatabase – Many organizations are in the process of converting, but it’s a big job Shapefile • Shapefile Data Model (1990s) – Introduced with ArcView 2.0 in early/mid 1990s – openly published structure for spatial data (Coverages are proprietary) • Partially an attempt (successfully!) by ESRI to make “their” format the industry standard – much simpler than coverages: 3 main files rather than multiple folders and files – Attribute (feature) data stored in dBase (.dbf) files – Very successful and popular Problems: – Spatial data not fully topological – dBase data base out-dated and abandoned by the market • Shapefile today (2000s) – – – – Still a useful model for its simplicity: it will endure Especially valuable for sharing/moving spatial data between different vendors formats Never intended as an industrial strength spatial data base system For internal operations, Geodatabase is preferred: it’s a far more powerful model • Shapefile is a file-based model • Geodatabase is a database-based model Smart Data Compression (SDC) format (an aside) • • • • • • A proprietary “flat file” format from ESRI used by commercial data vendors to distribute data sets to be used with ESRI software (e.g. highway network files) data is encrypted, highly-compressed and read only unless converted to a standard format (shapefile, etc..) with ArcCatalog Essentially, is a copyright protection device and an inconvenience (generally) for users An SDC feature (vector) dataset is a table of attributes containing one or more Shape columns. Each Shape column contains a different representation of the same feature. In ArcCatalog you’ll see one SDC feature class for each Shape column. All feature classes in the feature dataset will have the same type of features, points, lines, or polygons, and the same set of attributes. Typically each SDC feature class would be used at different map scales. For example, an SDC feature dataset representing major highways might have four feature classes: majhwys, majhwys_1, majhwys_2, and majhwys_3. The majhwys feature class contains the most detailed features, while the majhwys_3 feature class contains the most generalized features. – A group layer can take advantage of this by showing the majhwys feature class at large scales and the majhwys_3 feature class at small scales. • To create data in SDC format you must use a Data Developer's Kit. However, ArcGIS provides conversion tools that let you convert from SDC data to shapefiles and other formats. 3/23/2016 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 26 • Geodatabase Data Model Geodatabase – Introduced with ArcInfo v.8 in 2000 – Built on object-oriented concepts and technology Aims – handle complex geographic data with a uniform data model – Data model is independent of the underlying relational data base (RDBMS) • Can store in any industry standard DBMS (e.g Access, Oracle or SQL Server) • Technically, its referred to as an application tier on top of the dbms – Integrity of data is enhanced thru data rules rather than writing special code: • Attribute domains: range domains and coded value domains – – Control permissable values for attributes # of lanes must be integer between 1 and 12 • Validation rules: attribute rules, connectivity rules, relationship rules – Dirt road cannot intersect with a freeway – Data rules, plus code if necessary, can give objects behavior: they are closer to their real world and logical model equivalents • Objects are poles, roads, parcels rather than points, lines, polygons – see Zeiler, Chapter 1 for overview, Chapter 5 for detail Geodatabase Today (2000s) – The preferred approach to use today – Is a true database unlike shapefiles • Powerful capabilities (domains, validation rules, etc) for ensuring data integrity and simplifying data entry – Can be incorporated into “industry standard” data bases such as Access, SQL Server, Oracle Much more detail later!!! Geodatabase: two formats available • Personal Geodatabase – – – – Single user editing only Max of 250,000 features per layer (feature class), 2GB total database size Implemented in MS Access Requires no additional licensing • Enterprise Multi-user geodatabase – Supports simultaneous editing by multiple users via versioning – Each user simultaneously edits different versions of same layer • Potential conflicts are resolved when versions are consolidated – implemented via Spatial Database Engine (SDE), an extra cost product – SDE provides connection to mainstream object- relational databases (e.g, Oracle, IBM’s DB2, Microsoft SQL) which stores the geographic data, another extra cost GIS User 3/23/2016 Ron Briggs, UT-Dallas Server SDE db GISC 6383 GIS Management and Implementation 28 Spatial (Geographic) Data Models from Other Vendors • Focus above is on spatial data models from ESRI, the GIS market leader • Other GIS vendors (MapInfo, Intergraph Geomedia, etc.) have equivalent data models, although generally less sophisticated than the geodatabase • Oracle Spatial is offered by Oracle Corporation, the database market leader 3/23/2016 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 29 Oracle Spatial • Offered by Oracle Corporation as a capability for storing geographic data – Originally called Spatial Data Option (SDO) when first released with Oracle 7.3.3 – Version 8i provided a new, object-relational data model • Is a native capability completely independent of ESRI – Not the same as using ESRI’s SDE to save a Geodatabase in Oracle • Supposedly possible (although we have not actually done it) for ArcGIS to directly read data stored in Oracle Spatial • Oracle Spatial includes: – a tool to read shapefiles into Oracle Spatial – MapViewer, a primitive “GIS” application for viewing spatial data – Various commands for doing spatial analysis, for example: • • • • SDO_BUFFER SDO_NN (identify nearest neighbor) SDO_DISTANCE SDO_NN_DISTANCE • Could be viewed as Oracle’s attempt to compete with ESRI in the GIS arena – More plausibly, allows existing Oracle users to incorporate geographic analysis in their Oracle application • E.g. allow an insurance company to estimate potential claims from a hurricane 3/23/2016 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 30 Conclusion • Object oriented, or more specifically object-relational, databases are now the norm • They are much more sophisticated than their predecessors • They can substantially reduce the need for coding and custom programming, and they improve data quality But there are costs: • Their effective use requires planning • Many decisions are required for efficient use These issues will be addressed in: dbdecisions.ppt dbdesign.ppt 3/23/2016 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 31 Appendix: Object Oriented Implementations 3/23/2016 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 32 Object Concepts: terminology • Components – objects implemented in binary code (essentially synonymous with object) • Encapsulation – the packaging of properties and methods (behaviors) into objects, with access only thru a well defined set of software rules or interfaces (internal detail of objects is ‘hidden’) • Instances and Instantiation – objects (actual “things” e.g. my single family [SF] residence) are instances of classes (purely conceptual e.g. residential class) – An object is instantiated when data (e.g. from a table) is added to class events & methods (class + data = object) • Inheritance – sharing or “passing down” of properties and methods from higher order (general) to lower order (specific) classes – Square footage passes down from all property to residential to SF residence, but # of bedrooms only from residential to SF residential • Associations – Objects have various types of relationships with each other • General association: a person owns a parcel • Inheritance association: a house is a type of building 3/23/2016 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 33 Object Concepts: IT software implementation • Sun’s Java (1990-91) – intended to be platform independent • Object Management Group (CORBA) – attempt at industry standard – But Microsoft went its own way • Microsoft (MS) – OLE (object linking and embedding) – COM (Component Object Model) with ActiveX components • components are used in MS development environments such as Visual Basic (part), Visual C++, Visual Java • ArcGIS version 8 and later is written using COM objects – .NET released in 2003 to extend OO programming to the web • wrappers provide backward compatibility for COM objects 3/23/2016 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 34 Object Concepts: Initial GIS software implementation • Smallworld – first OO-based package – Now owned by GE and focuses on utility systems management • Intergraph – first major GIS vendor to embrace MS OLE/COM with its GeoMedia product • ESRI ArcView 3 & Avenue (early 1990s) – proprietary OO objects and associated development environment (Avenue) • ArcInfo 7.1.2 ODE (Open Development Environment) (mid 1990s) – attempt to create objects out of classic ArcInfo code • ESRI Map Objects (late 1990s) – – – – – – Limited suite of COM objects for use with development packages such as VB, Delphi Intended for incorporating geographic analysis in non-GIS application systems Java edition released in 2002 Does not require ArcGIS desktop license to run, but royalties must be paid Objects not compatible with ArcObjects (see next slide) Been replaced by ArcGIS Engine 3/23/2016 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 35 Object Concepts: Current ESRI GIS software implementation ArcGIS (8.0 released 2000, 9.0 released 2004) – ArcView 3 and ArcInfo 7 integrated and totally rewritten using COM/ActiveX ArcObjects – The objects out of which ArcGIS built – Are available for customizing and extending ArcGIS applications via: Visual Basic for Applications – Allows customization and extended capability within ArcGIS desktop – Requires ArcGIS desktop license to run these ArcObject customizations ArcGIS Engine (released with version 9 of ArcGIS in 2004) – Allows ArcObjects objects to be used in building standalone, client-based applications – Requires royalty payment, but no client GIS license – Runs under Windows, Unix, Linux, with support for Java, C++, COM and .NET – Replaces MapObjects ArcGIS Server (released with version 9 of ArcGIS in 2004) – Permits the creation of server-based GIS capabilities – Provides GIS capabilities to a user without a desktop GIS system via web interface – No client royalty or license required – Supports .NET and Java on Windows, and Java on Sun Solaris and Linux Object Concepts: database implementation • Database Vendors – specialized oo data base vendors, such as Versant, now gone – most mainstream relational database vendors (Oracle, etc) incorporate “glob” data – Referred to as the “object-relational” model • GIS Vendors (ESRI specifically) – Georelational database: classic non-OO model • special purpose relational database (INFO) • Coverages store vector data using point/line/polygon concept • raster data and TINS kept in separate files – Geodatabase: new OO model with two formats • Personal geodatabase implemented in MS Access – Single user editing with no versioning – Max of 250,000 features per layer (feature class), 2GB total database size • Enterprise Multi-user geodatabase implemented via Spatial Database Engine (SDE) connection to mainstream OO-capable relational database (e.g, Oracle) – Multiuser support with versioning (multiple users simultaneously edit different versions of same layer). GIS User Server SDE db