L10 - The GIS Database – Part 1 Chapter 8 Entity • Bangor – Penobscot County, Maine, United States – Centroid 44.801N , 68.778W – Area 34.4 square miles – Elevation – 158 feet – Population 31,473 Standard GIS Data Model Linked spatial and attribute data. What is a database A database is any organized collection of data. Some examples of databases you may encounter in your daily life are: – a telephone book – T.V. Guide – airline reservation system – motor vehicle registration records – papers in your filing cabinet – files on your computer hard drive. Data vs. information: What is the difference? • What is data? – Data can be defined in many ways. Information science defines data as unprocessed information. • What is information? – Information is data that have been organized and communicated in a coherent and meaningful manner. – Data is converted into information, and information is converted into knowledge. – Knowledge; information evaluated and organized so that it can be used purposefully. Database Definitions What is a database? It’s an organized collection of data. What is a database management system? A database management system (DBMS) which provides you with the software tools you need to organize that data in a flexible manner. It includes tools to add, modify or delete data from the database, ask questions (or queries) about the data stored in the database and produce reports summarizing selected contents. What is the ultimate purpose of a database management system? Is to transform Data Information Knowledge Action Features of a DBMS • Database Management Systems provide features to maintain database: – – – – – – Data independence Integrity and security Transaction management Concurrency control Backup and recovery Provides a language for the creation and querying of the database. – A language for writing application programs Selecting a Database Management System Database management systems (or DBMSs) can be divided into two categories -- desktop databases and server databases. • Generally speaking, desktop databases are oriented toward single-user applications and reside on standard personal computers (hence the term desktop). • Server databases contain mechanisms to ensure the reliability and consistency of data and are geared toward multi-user applications. Database Terminology Relational Databases • The relational database model is the most dominant model in both the corporate and GIS world, due to its flexibility, organization, and functioning.. • It was defined by Edgar F. Codd (1970). • It can accommodate a wide range of data types. • It is not necessary to know beforehand the types of processing that will be performed on the database. Relational Database Terminology • Each table contains the data for a single entity. • Each instance of an entity is a row/record/tuple in a table. • Columns contain attributes/fields that describe the entity. – Attributes must be from the same domain (text, integer, date). – Column order has no significance. • Tables are related through keys. Attributes • An entity is represented by a set of attributes, that is descriptive properties possessed by all members of an entity set. Domain – the set of permitted values for each attribute • Attribute types: – Simple and composite attributes. – Single-valued and multi-valued attributes • E.g. multivalued attribute: phone-numbers – Derived attributes • Can be computed from other attributes • E.g. age, given date of birth Keys • A super key of an entity set is a set of one or more attributes whose values uniquely determine each entity. • A candidate key of an entity set is a minimal super key – Customer-id is candidate key of customer – account-number is candidate key of account • Although several candidate keys may exist, one of the candidate keys is selected to be the primary key. Keys • A composite key is a key with more than one attribute. • A foreign key is an attribute that is a key of one or more relations other than the one in which it appears. Keys Primary Key Foreign Key Primary Key Foreign Key Primary Key Foreign Key Relational Algebra • Codd’s specification of a relational database relied on relational algebra. • Relational algebra takes tables/relations as inputs and returns tables as outputs. • The algebra combines or splits tables by rows or by columns to generate either a subset of tables or an expanded tables. Fundamental building blocks Tables comprise the fundamental building blocks of any database. The table above contains the employee information for an organization -characteristics like name, date of birth and title. Examine the construction of the table and you'll find that each column of the table corresponds to a specific employee characteristic (or attribute in database terms). Each row corresponds to one particular employee and contains his or her information. Relational Algebra • Five basic operators – select: – project: – union: – difference: – – Cartesian product: x • The operators take one or two relations as inputs and produce a new relation as a result. Relational Algebra • Derived Relational operators – Intersection – Divide (not used very often) – Join • These can be expressed using different combinations of the fundamental operators. Select Operation – Example Relation r A=B ^ D > 5 (r) Select from relation r where A=B AND D>5 A B C D 1 7 5 7 12 3 23 10 A B C D 1 7 23 10 Database Queries • Queries may be made of one table or several tables at the same time. • In many systems querying is facilitated by icons, or menus, or queries by example (QBE – a graphical query language ). SQL • DDL – Data Definition Language; used to create and manage the database. • DDM – Data Manipulation Language; used to query the database. SQL • SQL: widely used non-procedural language – E.g. find the name of the customer with customer-id 192-837465 select customer.customer-name from customer where customer.customer-id = ‘192-83-7465’ – E.g. find the balances of all accounts held by the customer with customer-id 192-83-7465 select account.balance from depositor, account where depositor.customer-id = ‘192-83-7465’ and depositor.account-number = account.account-number • Application programs generally access databases through one of – Language extensions to allow embedded SQL – Application program interface (e.g. ODBC/JDBC) which allow SQL queries to be sent to a database Attribute Queries The ArcGIS Attribute Query Interface State’s Table is Open Table is Open Options Related tables Select by attributes Switch selection Clear selection Zoom to selected Delete Selected No Table is Open Selection>Select by Attributes from the Menu Bar A Spatial Query in SQL SELECT city.name, city.geometry FROM city, county WHERE county.name=‘Penobscot’ AND city.geometry INSIDE county.geometry city.population>30000; Spatial Selection Table Joins • Table joins depend on the data not the attribute name. • There are many different types of table joins. • Tables can be joined regardless of the relationship EXCEPT: – When joining to the feature attribute table, the relationship must be 1:1 or M:1 – Other relationships must use the relate. One-to-One Join Employee-id Job Employee-id name 1 Digislave 1 Tom 2 Useless Supervisor 2 John Join Employee-id to Employee-id After join Employee-id Job Name 1 Digislave Tom 2 Useless Supervisor John A join does not permanently alter the table structure Many-to-One Join Polygon Id Symbol Symbol Description 1 Qa Qa Quaternary Alluvium 2 Qa Qe Quaternary Eolian 3 Pa Pa Permian Abo 4 Qe After Join on Symbol Polygon ID Symbol Description 1 Qa Quaternary Alluvium 2 Qa Quaternary Alluvium 3 Pa Permian Abo 4 Qe Quaternary Eolian Relate in a GIS http://courses.washington.edu/gis 250/lessons/introduction_gis/relati Relational Database Modeling • Entity Relationship Model – the result is a diagram of all of the entities, their attributes, and the relationships between entities – Each entity becomes a table. – Each relationship becomes a table • Database Normalization – The result is a number of entity tables. We still need to create the relationship tables. Entity Relationship Diagram ENTITY Rectangles represent entity sets. Diamonds represent relationship sets. Lines link attributes to entity sets and RELATIONSHIP entity sets to relationship sets. Ellipses represent attributes Double ellipses represent multivalued attributes. ATTRIBUTE Dashed ellipses denote derived attributes. Underline indicates primary key attributes (will study later) Entity Relationship Diagram Name Address ID Name Number Phone Student Major Advisor Credits Courses Enrolls In GPA Credits Time Instructor Room Types of Relationships between Entities • 1:1 – one faculty member is assigned to one office. • 1:M (M:1) – one faculty member teaches many courses. • M:N – many students take many courses. Orono Charter Airlines Orono Charter Airlines is an aircraft charter company that supplies on-demand charter flight services to corporate customers, using a fleet of four different aircraft. For each plane, they need to keep track of the number of passengers it can hold, the charge per seat/mile, the manufacturer, model name, total miles flown, and date of last annual maintenance. Aircraft are always identifiable by a unique registration number. For each flight, the company must keep track of the date, pilot and copilot’s names, the plane used, the destination, distance, air time, ground time and fuel and oil used. In addition to pilots and copilots, the company has customer service personnel, mechanics and maintenance personnel. In addition to the normal employee information, they must keep additional information on the pilots, such as their license number, their aircraft rating and when they had their last medical. Entity Sets • • • • • • Aircraft Customer Charter Pilot Employee Model Relationship Sets Customer 1 Requests M Charter Adding the Attributes Customer LicenseNum Name 1 Requests Charter M Date PlaneRegNum PilotName StreetAddr CoPilotName CityAddr Destination State Distance Zip Airtime GroundTime Phone FuelUsed CreditCardType OilUsed CreditCardNum Date PlaneRegistrationNo PilotName CopilotName Destination Distance AirTime GroundTime FuelUsed OilUsed LicenseNo CustomerName CustomerAddress CreditCard CreditCardNumber Customer 1 Requests M Charter M Flies 1 Aircraft M M CoPilots RegistrationNo MilesFlown DateLastMaintenance NoPassengers Charge_Seat_Mile M Pilots References 1 1 Pilot SSNo LicenseNo Rating MedicalDate 1 Model Orono Airlines 1 ModelName Manufacturer Address Phone ContactPerson Is A 1 Employee SSNo Name Address Phone JobTitle Salary Database Normalization • Start with a single relational schema containing all attributes and decompose it. • The goal is to decrease the chance of inconsistent information in the database. • There is a hierarchy of rules for decomposing the table. • In most cases, only the first three are used. Normalization Rules • Begin with an unnormalized user view. • To put all tables in first normal form (1NF), remove repeating groups. • To put all tables in second normal form (2NF), remove partial dependencies. • To put all tables in third normal form (3NF) remove all transitive dependencies. Unnormalized Table Student Name _ID Major 3421 Jone SIS s 8725 Dow … CRN 333 222 BIO 555 666 777 Title Instructor Instructor _Office Aaaaa Bbbbb Bbbbb Ccccc Dddd Holden Taylor Taylor Allen Kemp 134 BD Grade A 222 M C 222 M B 351 B B317 Nv C+ Remove repeating groups – Creates two tables Student Table StudentID Name Major 3421 Jones SIS 8725 Dow BIO … Student_course Table Student_I D CRN Title Instructor Instructor _Office Grade 3421 333 Aaaaaa Holden 125 Bd A 3421 222 Bbbbbb Taylor 222 M C 8725 555 Bbbbbb Taylor 222 M B 8725 666 Cccccc Allen 351 B B- 8725 777 Dddddd Kemp 317 Nv C+ Remove partial dependencies. This refers only to tables having a composite key. You need to identify attributes that require only part of the composite key, and remove them. Registration Table Student_ID CRN Grade 3421 333 A 3421 222 C 8725 555 B 8725 666 B- 8725 777 C+ Course-Instructor Table CRN Title Instructor Instructor _Office 333 Aaaaaa Holden 125 Bd 222 Bbbbbb Taylor 222 M 555 Bbbbbb Taylor 222 M 666 Cccccc 351 B 777 Dddddd Kemp Allen 317 Nv To put a table in third normal form remove transitive dependencies A transitive dependency is an attribute that depends only on another attribute, not a key. CRN Title Instructor 333 Aaaaaa Holden 222 Bbbbbb Taylor 555 Bbbbbb Taylor 666 Cccccc Allen 777 Dddddd Kemp Course Table Instructor Table Instructor Instructor _Office Holden 125 Bd Taylor 222 M Taylor 222 M Allen 351 B Kemp 317 Nv Studen Name t_ID Major CRN Title StudentID Name Major Student_ID CRN Grade CRN Title Instructor Instructor Instructor _Office Instructor Instructor _Office Grade When NOT to Normalize • Relates in large tables require large amounts of computer time to process. – If you have an application that speed is more important than database structure. • If you are creating a very simple data base that will be used for a short time and then discarded.