1. Query language overview 1.1 What is a query language? A query is an interesting question about data. A query language is a language that specifies a query to a database in declarative manner. 1.2 Natural language and formal language Natural language: human language like English, German, or Japanese. Its usage may change (disappear or appear, or modified) with time. Computer cannot interpret it directly Formal language: language that computer can interpret. There is no ambiguity in meaning (semantics). The grammar (syntax) does not change. programming and query languages are formal language. 1.2 Structured Query Language (SQL) SQL is a standard query language that supports the relational logical model such as primary, foreign key. Majority of DBMS support SQL (MS Access, Oracle, MySQL) GIS software uses SQL to interact with DBMS behind the screen when doing a query. 2. Relational algebra 2.1 RA properties Algebra: mathematical expressions consisting of two distinct sets - operands and operations. Relational algebra: a formal query language associated with relational data model. Relations are placed in operands instead of numbers, or symbols used in normal algebra, and the results are also relations (closure). Closure: the result of operations is the same type of objects. This allows to create nested operations that consist of a small and simple sets of operations. 2.2 RA operations a) Select σ: Retrieve a subset of tuples (rows) of a relation. A single relation is used. b) Project π: Retrieve a subset of field (columns) for all rows in a relation. A single relation is involved. Set operation c) Union: Requires two relations that are union compatible. A U B returns all tuples that are in either of relations d) Difference: Requires two relations that are union compatible. A – B returns all tuples in relation A that are not in the relation B e) Intersection: Requires two relations that are union compatible. A ∩ B returns all tuples that are in both relations. f) Cross-product: does not require union compatible and many relations can be involved. If you have n rows in relation A, and m rows in relation B, you will get n x m rows in resultant relation Union compatible: two relations are union compatible if they have the same number of columns, and the same domain and the columns are in the same order from left to right. g) Join Logical linking tuples across multiple relations. This is not primitive RA operation. Combine selection, projection, and cross product operation. Conditional joins: Conduct cross product to form all the possible combinations of tuples from tables and then selection of tuples that meets conditions. Natural joins (equi- joins): use only condition of equality in selection. 3. SQL Declarative query language, meaning that user tells the DBMS what you want to know, rather than how it is calculated. The algorithms to answer the question are not required. c) SQL components Data define language (DDL) To create, update (Alter) and delete (Drop) the relation or other elements of database such as views. Data manipulation Language (DML): To populate and edit table content, and to query the content of a database Data control language (DCL): to control the access control, permission 3.2. SQL DDL a) Create Table CREATE TABLE <table name> ("column 1" "data_type_for_column_1", "column 2" "data_type_for_column_2", ... Primary Key <attribute>, ... <attribute> Foreign Key <attribute>) So, if we are to create the customer table specified as above, we may type in CREATE TABLE customer (First_Name char(50), Last_Name char(50), Address char(50), City char(50), Country char(25), Birth_Date date Primary Key (First_Name, Last_Name)) Create view (View simplify complicated nested queries) CREATE VIEW <Name> AS SELECT FROM WHERE b) Alter Table: Change relation schema c) Drop Table: Delete an empty table DROP TABLE <table name> So, if we wanted to drop the table called customer that we created DROP TABLE customer DML: a) DML commands for populating and editing - Insert: Populate the tables INSERT INTO <table>(<attribute>, ...<attribute>) VALUES(<value>, ....<value>) - Update: Change values within a specified row - Delete: delete rows DELETE FROM <table> WHERE<condition> b) Basic Select Format: to query data in one or more tables Feature: consists of many clause. return a table. Allow nested query (difficult to understand). SELECT <columns> FROM <Relations> WHERE <condition> Order By <columns> : to specify the sorting columns for result Group By <columns> : to group together rows based on the column for statistical calculation Having <condition> : limit the result of GROUP BY result Note: If the limitation is based on individual attribute value, the condition is specified in WHERE clause. If the limitation is based on the group statistics calculated in Group By clause, condition is written in Having clause. 3. SQL 3.1. General information a) SQL & relational databases b) SQL3 & abstract data types - Operators and functions Arithmetic Comparison = (equal), < (less than), > (greater than), >= (greater than or equal), <= (greater than or equal), <> (not equal), Logical AND OR Combination of AND and OR (AND is computed first then OR, or use parent hes) BETWEEN ... AND... (specify the range of numeric values) LIKE IN (matches any values in the given list) Set ALL Tests whether all rows in the result set of a subquery meet the specified condition. ANY Tests whether one or more rows in the result set of a subquery meet the specified condition. Statistical SUM, COUNT, MIN, MAX, AVE Aggregate statistics Count: Count (*): count the number of the rows Count (distinct <column>): evaluate how many different attributes there are. Join (Equi-join, Multi-table join): Order of join optimizes the query but not affect the result Rule 1: Joined tables are listed in FROM clause. Rule 2: Attributes used to join tables are declared and matched in WHERE clause. 4. Spatial queries 4.1. Strategies for storing spatial data in relational tables Background: SQL language is limited to atomic data type a) Fully normalized Maintain the functionality and properties of traditional RDBM and fit the spatial data into relational schema. The problem is that data redundancy occurs. b) Georelational model: Separate spatial data and attribute data into different data model, such as Arc/Info No ability to use RDBM functionality such as query, concurrency control etc. c) BLOB: Binary large object is a data type used in a RDBMS that consists of binary data type. Spatial data type could use this data type; however, spatial operation is difficult to perform since tedious decoding is necessary to convert back to spatial data, in another word, SQL cannot handle blob formed data. d) Abstract data types: This can be integrated into ORDBMS. SQL3 can handle this data type. 4.2. Extending SQL for spatial data a) OGIS data model: industry-wide standard of geometry data types, spatial operation, currently supported by major DBMS vendors. - Basic geometry 4 sub-classes: Point, curve, surface and geometryCollection Conceptual versus substantiated entities - Operators Three categories 1. Basic operation that is applied to all geometry types Envelope(), 2. Test regarding topological relationships, return true or false 3. Spatial data analysis (area, distance, buffer, convexHull, etc) Spatial analysis queries Area (geometry) Distance (geometry1, geometry2) Topological queries Touch Cross Combined spatial analysis & topological operations Buffer & overlap Aggregate spatial queries Nested spatial query Exercise 1. Express the following query in RA a) Find all countries whose GDP is greater than $500 but less than $1000. π Name (σ GDP > 500 ∩ GDP < 1000 (Country)) b)List the life expectancy in countries that have rivers originating in them C = Country R = River π Name, Life-Exp (σ Co. Name = R. Origin (C x R)) c)Find all cities that are either in South America or whose population is less than two million Ci = π Name (σ Pop < 2 (City)) Co = π Name (σ Country. Cont = SAM (Country x City)) U = Ci U Co d)List all cities which are not in South America Co = π Name (σ Con ≠ SAM (Country)) Ci = π Name, Country (City) U = π Ci. Name (σ Ci. Country = Co. Name (Co x Ci)) Exercise 2 Express in SQL a) Find all countries whose GDP is greater than $500 but less than $1000. SELECT Name FROM Country WHERE GEP Between 500 AND 1000 b) List the life expectancy in countries that have rivers originating in them SELECT C. Life-Exp FROM Country C, River R WHERE C. Name = R. Origin c) Find all cities that are either in South America or whose population is less than two million SELECT Ci. Name FROM Country Co, City Ci WHERE Co. Name = Ci. Country AND (Co. Cont = SAM OR Ci. Pop < 2) d) List all cities which are not in South America SELECT Ci. Name FROM Country Co, City Ci WHERE Co. Name = Ci. Country AND Co. Cont <> SAM Exercise 3 Express in SQL a) Count the number of countries whose population is less than 100 million SELECT Count (*) FROM Country WHERE Pop < 100 b) Find the country in North America with the smallest GDP. Use nested query SELECT Name FROM Country WHERE Count = NAM AND GDP =< ALL (SELECT GDP FROM Country WHERE Count = NAM) c) List all countries that are in North America or whose capital cities have a population of less than 5 millions. SELECT Name FROM Country Co, City Ci WHERE Co. Name = Ci. Country AND (CO. Count = NAM OR (CI. Capital = ‘Y’ AND Ci. Pop < 5) d) Find the country with the second highest GDP. SELECT Name FROM Country WHERE GDP <> Max (GDP) AND GDP >= ALL (SELECT GDP FROM Country WHERE GDP <> Max (GDP)) Exercise 9. Express the following query in SQL, using OGIS extended data type and functions a) List all cities in the City table which are within five thousand miles of Washington D.C SELECT Ci. Name FROM City Ci, City Ci1 WHERE Overlap (Ci. Point, Buffer (Ci1. Point, 5000)) = 1 AND Ci1. Name = 'Washington D.C.' b) What is the length of Rio Paranas in Argentina and Brazil? SELECT Co. Name, Length (intersection (R. Shape, Co. Shape)) As “Length” FROM River R, Country Co WHERE Cross (R. Shape, Co. Shape) = 1 AND (Co. Name = 'Argentina' OR Co. Name = 'Brazil') AND R. Name = 'Rio Paranas' c) Do Argentina and Brazil share a border? SELECT Count (Co1. Name) FROM Country Co1, Country Co2 WHERE = Touch (Co1. Shape, Co2. Shape) = 1 AND Co2. Name = 'Brazil' AND Co1. Name = ‘Argentina’ d) List the countries that lie completely south of the equator. SELECT Co. Name FROM Country Co WHERE = Exercise 10. Transform the following query into SQL, using OGIS specified data type and operations Give the schema. RIVER (NAME: char, FLOOD-PLAIN: polygon, GEOMETRY: line string) ROAD (ID: char, NAME: char, TYPE: char, GEOMETRY: line string) FOREST (NAME: char, GEOMETRY: polygon) LAND-PARCELS (ID: integer, GEOMETRY: polygon, county: char) a) Name all the rivers that cross Itasca State Forest SELECT R. Name FROM River R, Forest F WHERE Cross (R. Geometry, F. Geometry) = 1 AND F. Name = Itasca State Forest' b) Name all the tar roads that intersect Itasca State Forest SELECT F. Name FROM Forest F, Road R WHERE Cross (R. Geometry, F. Geometry) = 1 AND F. Name = Itasca State Forest' AND R. Type = ‘tar’ c) All roads which stretches within the floodplain of the river Montana are susceptible to flooding. Identify all these roads SELECT Ro. Name FROM River Ri, Road Ro WHERE Cross (Ro. Geometry, Ri. FLOOD-PLAIN) = 1 AND Ri. Name = ‘Montana' d) No urban development is allowed within two miles of the Red River and five miles of the Big Tree Park. Identify the land parcels and the county they are in that cannot be developed. SELECT LP. County, LP. Geometry FROM Land-Parcels LP, River R WHERE (Within (LP. Geometry, Buffer (R. Geometry, 2)) = 1 AND R. Name = ‘Red River’) OR (Within (LP. Geometry, Buffer (R. Geometry, 5)) = 1 AND R. Name = ‘Big Tree Park’) e) A river defines part of boundary of a county CREATE VIEW County AS SELECT * FROM Land-parcels GROUP BY County SELECT R. Name FROM River R, County C WHERE touch (R. Geometry ,C. Geometry)= 1 Exercise 18. For state park example, outline SQL DDL statements to create relevant tables using OGIS spatial data type. CREATE TABLE Forest( Name Varchar (35) Shape Polygon) CREATE TABLE River( Name Varchar (35) Shape LineString) CREATE TABLE Road( Name Varchar (35) Num_of_Lane Integer Shape LineString) CREATE TABLE Facility( Name Varchar (35) Shape Point) CREATE TABLE Fire-Station( Name Varchar (35) Shape Point) CREATE TABLE Forest-Stand( Stand-Id Integer Species Varchar (35) Shape Polygon) CREATE TABLE Manager( Name Varchar (35) Gender Varchar (5) Age Integer)