Database Systems – Set Theory

advertisement
Database Systems – SQL
SQL – STRUCTURED QUERY LANGUAGE
INTRODUCTION
SQL is a fairly “standard” query language for relational databases.
Many databases implement SQL, but many databases implement various levels of the
ANSI standards as well as adding additional functionality that may be a superset of the
standard.
Most of what will be taught here is standard across most mainstream databases.
SQL provides the ability to:







create tables
insert data
select data
update data
delete data
drop tables
create referential integrity
Database Systems – SQL
SQL – SELECT
Note, SQL is not case sensitive
Basic form of the SELECT statement is as follows:
SELECT fieldlist FROM tablelist WHERE condition;
or
SELECT A1, A2, A3, … An
FROM r1, r2, …, rm
WHERE condition;
Although not required, my style is to capitalize keywords
Observe the following query which returns all website names from the websites table:
SELECT website
FROM websites;
Database Systems – SQL
SQL - SELECT
SELECT website FROM websites;
Returns all the website names in the table websites. The order is not
guaranteed.
website
www.zojjed.com
www.racewalk.com
www.greattreks.com
www.twofeetgallery.com
www.walkinghealthy.com
www.cs.drexel.edu/~jsalvage
result set
If you wish all of the fields to be returned from a query, use * instead of the
field list.
Therefore, to select all fields from the websites table, use the following
query:
SELECT * FROM websites;
Database Systems – SQL
SQL – SELECT
To select all fields from the websites table, use the following query:
SELECT * FROM websites;
id_website
website
organization
first_year
category
1
www.zojjed.com
Walking Promotions
2006
Fiction
2
www.racewalk.com
Walking Promotions
1995
Health
3
www.greattreks.com
Walking Promotions
2006
Travel
4
www.twofeetgallery.com
Walking Promotions
2004
Photographs
5
www.walkinghealthy.com
Walking Promotions
2002
Health
6
result set
www.cs.drexel.edu/~jsalvage
Drexel University
2005
Education
Database Systems – SQL
SQL – SELECT
Write a query to return all the website names in the table customers assuming the
customers table was set up similar to the customers relation when we discussed set
theory.
Note, the order is not guaranteed.
The customer table has duplicate website names. Unlike in set theory, duplicates ARE
returned from SQL by default.
SELECT website
FROM customers;
website
first-name
website
Derek
last_nam
e
Jeter
www.zojjed.com
www.zojjed.com
Chase
Utley
www.zojjed.com
www.drexel.edu/~jsalvage
Jeremy
Johnson
www.drexel.edu/~jsalvage
www.racewalk.com
Ryan
Howard
www.racewalk.com
www.zojjed.com
Ryan
Howard
www.zojjed.com
customers table
www.zojjed.com
result set
Database Systems – SQL
SQL – SELECT – DISTINCT
BY adding the DISTINCT keyword, only unique rows return from the SELECT query.
This is similar to the behavior of set theory, but slows performance.
The following query returns all the UNIQUE website names in the table customers.
The order is not guaranteed.
SELECT DISTINCT website
FROM customers;
website
first_name
website
Derek
last_nam
e
Jeter
www.zojjed.com
www.zojjed.com
Chase
Utley
www.drexel.edu/~jsalvage
www.drexel.edu/~jsalvage
Jeremy
Johnson
www.racewalk.com
www.racewalk.com
Ryan
Howard
www.zojjed.com
Ryan
Howard
customers table
www.zojjed.com
result set
Database Systems – SQL
SQL SELECT – MULTI TABLE
The FROM clause defines the Cartesian product of the relations.
So you can link two tables by listing them together (separated by a comma) like this:
SELECT websites.website, websites.category, sales.first_name, sales.last_name
FROM websites, sales
WHERE websites.website = sales.website;
This is the a form of the natural join. However, we are not allowed to use it!
website
category
first_name
last_name
www.zojjed.com
Fiction
Derek
Jeter
www.zojjed.com
Fiction
Chase
Utley
www.zojjed.com
Fiction
Ryan
Howard
www.racewalk.com
Health
Ryan
Howard
www.cs.drexel.edu/~jsalvage
Education
Jeremy
Johnson
I will show you a better style. If you use this style on the exam, you will lose
50% of the points of a problem for each time you use it.
Database Systems – SQL
SQL SELECT – WHERE CLAUSE
You can add additional constraints by adding clauses to the WHERE clause.
So you can link to tables like this:
SELECT websites.website, websites.category, sales.first_name, sales.last_name
FROM websites, sales
WHERE websites.website = sales.website AND websites.category = ‘Fiction’;
website
category
first_name
last_name
www.zojjed.com
Fiction
Derek
Jeter
www.zojjed.com
Fiction
Chase
Utley
www.zojjed.com
Fiction
Ryan
Howard
Database Systems – SQL
SQL SELECT – WHERE CLAUSE
Do you think there is a difference in the result set between the following two SQL
statements?
SELECT websites.website, websites.category, sales.first_name, sales.last_name
FROM websites, sales
WHERE websites.website = sales.website AND websites.website
=‘www.racewalk.com’;
and
SELECT websites.website, websites.category, sales.first_name, sales.last_name
FROM websites, sales
WHERE websites.website = sales.website AND sales.website = ‘www.racewalk.com’;
Note, the typographical difference between the two statements is the table specifying
the comparison for the website name.
Database Systems – SQL
SQL SELECT – WHERE CLAUSE
Do you think there is a difference in the result set between the following two SQL
statements?
SELECT websites.website, websites.category, sales.first_name, sales.last_name
FROM websites, sales
WHERE websites.website = sales.website AND websites.website
=‘www.racewalk.com’;
and
SELECT websites.website, websites.category, sales.first_name, sales.last_name
FROM websites, sales
WHERE websites.website = sales.website and sales.website = ‘www.racewalk.com’;
Note, the typographical difference between the two statements is the table specifying
the comparison for the website name.
There is no difference between the result sets because only records where the website
matches in both tables are included. Therefore comparing either to a scalar value will
return in the same result set.
Database Systems – SQL
SQL SELECT - ALIASES
Sometimes typing out the entire tablename on each column is burdensome. When you
do not select columns that exist in both tables you do not have to list the table
qualifier. However, when the column exists in both tables then you must list it.
A short cut to listing the table name is to locally rename the table name within the
query using the AS keyword.
Observe the previous query rewritten with the AS keyword
SELECT w.website, w.category, s.first_name, s.last_name
FROM websites AS w, sales AS s
WHERE w.website = s.website and w.website = ‘www.racewalk.com’;
In addition, we can further simplify this by removing the currently unnecessary table
specifications as follows:
SELECT w.website, category, first_name, last_name
FROM websites AS w, sales AS s
WHERE w.website = s.website and w.website = ‘www.racewalk.com’;
However, this is bad form. Why?
Database Systems – SQL
SQL SELECT - ALIASES
Why is it bad to leave a column without a table specification even if it is the only
column with that specification?
The problem is when tables get modified in the future. If a table is modified and the
column is added, then previously working SQL statements can fail. Therefore, always
fully specify your columns with the table names they are selected from.
Database Systems – SQL
SQL SELECT - NULLs
When we discussed set theory we said that NULLs can exist, but we were going to
ignore them. In databases ignoring NULLs can create problems.
When we design out databases, key fields should not contain NULLs. We will discuss
this more later.
For know, if you want to check for NULL use the IS NULL or IS NOT NULL comparison
of the WHERE clause
The following query returns the website and category of all websites that has a NULL
value for an organization:
SELECT website, category
FROM websites
WHERE organization IS NULL;
The following query returns the website and category of all websites that has a nonNULL value for an organization:
SELECT website, category
FROM websites
WHERE organization IS NOT NULL;
Database Systems – SQL
SQL SELECT - ALIASES
The AS keyword, aka rename, is also used in the same manner as in set theory.
This means we can select from two representations of the same table within a single
SQL SELECT statement.
One of the issues with SQL is not that the language is difficult to understand, but
sometimes it is difficult to get it to do what you want it to.
Find all names of all branches that have branch assets greater than at least one branch
located in Brooklyn.
branch
city
assets
Center City
Philadelphia
5,000,000
North East Philly
Philadelphia
1,000,000
Cropsy Ave
Brooklyn
10,000,000
Bay Parkway
Brooklyn
7,500,000
Park Slope
Brooklyn
3,500,000
Medford
Medford
1,250,000
branches table
Database Systems – SQL
SQL SELECT – ALIASES
Find all names of all branches that have branch assets greater than at least one branch
located in Brooklyn.
SELECT distinct T.branch
FROM branch AS T, branch AS S
WHERE T.assets > S.assets AND S.city = ‘Brooklyn’;
branch-name
city
assets
branch-name
city
assets
Center City
Philadelphia
5,000,000
Cropsy Ave
Brooklyn
10,000,000
North East Philly
Philadelphia
1,000,000
Bay Parkway
Brooklyn
7,5000,000
Cropsy Ave
Brooklyn
10,000,000
Bay Parkway
Brooklyn
7,500,000
Park Slope
Brooklyn
3,500,000
Medford
Medford
1,250,000
branches table
Center City
result set
Philadelphia
5,000,000
Database Systems – SQL
The WHERE clause of the SQL statement has many options.
Sometimes we do not wish to match a string exactly, instead we use a wildcard
character.
% - the character matches a substring
In addition, instead of using the = operator, replace = with LIKE
Therefore, if you wish to select only those branches that start with the letter ‘C’ from
the branches table, you would use the following query:
SELECT branch
FROM branches
WHERE branch LIKE “C%”;
branch
city
assets
branch
Center City
Philadelphia
5,000,000
Center City
North East Philly
Philadelphia
1,000,000
Cropsy Ave
Cropsy Ave
Brooklyn
10,000,000
Bay Parkway
Brooklyn
7,500,000
Park Slope
Brooklyn
3,500,000
Medford
Medford
1,250,000
branches table
result set
Database Systems – SQL
SQL SELECT - WILDCARDS
The % wildcard character can also be used to search for strings that contain a certain
value.
To search for a specific value, place the % symbol before and after the value you wish
to search for.
The following query returns all branch-names that contain the string “Park”
SELECT branch
FROM branches
WHERE branch LIKE “%Park%”;
branch
city
assets
branch
city
assets
Center City
Philadelphia
5,000,000
Bay Parkway
Brooklyn
7,500,000
North East Philly
Philadelphia
1,000,000
Park Slope
Brooklyn
3,500,000
Cropsy Ave
Brooklyn
10,000,000
Bay Parkway
Brooklyn
7,500,000
Park Slope
Brooklyn
3,500,000
Medford
Medford
1,250,000
branches table
result set
Database Systems – SQL
SQL SELECT - WILDCARDS
The % wildcard character can also be used to look for special characters like the
percent symbol itself, the blackslash, or the double quote.
To use a special symbol, place a backslash before the symbol you wish to search for as
in the following examples:
like ‘ab\%cd%’ matches all strings starting with ab%cd
like ‘ab\\cd%’ matches all strings starting with ab\cd
In addition, you can use the underscore character to match a specific number of
characters.
For example:
_ - the character matches a single character
_ _ _ matches three characters
_ _ _ % any string of at least three characters
_ _ _.racewalk.com matches any address containing .racewalk.com with 3 characters
leading up to it.
Database Systems – SQL
SQL SELECT – ORDER BY
Often we wish to sort the results. This is easily accomplished adding the ORDER BY
clause to a SELECT statement. While some dialects of SQL allow the ORDER BY clause
to be placed in multiple places within the SQL statement, always place it after the
WHERE clause to be sure. The general form of the SQL SELECT with an ORDER BY
clause is as follows:
SELECT field list
FROM table list
WHERE predicate
ORDER BY field list;
The default ordering is ascending.
SELECT *
FROM branches
WHERE city = “Brooklyn”
ORDER BY branch;
branch
city
assets
branch
city
assets
Center City
Philadelphia
5,000,000
Bay Parkway
Brooklyn
7,500,000
North East Philly
Philadelphia
1,000,000
Cropsy Ave
Brooklyn
10,000,000
Cropsy Ave
Brooklyn
10,000,000
Park Slope
Brooklyn
3,500,000
Bay Parkway
Brooklyn
7,500,000
Park Slope
Brooklyn
3,500,000
Medford
Medford
1,250,000
branches table
result set
Database Systems – SQL
SQL SELECT – ORDER BY
You can also order more than one field. If you want to order the results by city and
then assets, you can use the following query
SELECT *
FROM branches
ORDER BY city, assets;
branch
city
assets
branch
city
assets
Center City
Philadelphia
5,000,000
Park Slope
Brooklyn
3,500,000
North East Philly
Philadelphia
1,000,000
Bay Parkway
Brooklyn
7,5000,000
Cropsy Ave
Brooklyn
10,000,000
Cropsy Ave
Brooklyn
10,000,000
Bay Parkway
Brooklyn
7,500,000
Medford
Medford
1,250,000
Park Slope
Brooklyn
3,500,000
North East Philly
Philadelphia
1,000,000
Medford
Medford
1,250,000
Center City
Philadelphia
5,000,000
branches table
result set
Database Systems – SQL
SQL SELECT – ORDER BY
If you wish to order the results in descending order, add DESC in front of the attributes
you wish to sort in descending order.
The following query returns all rows from the branches table sorted in ascending order
of city, but descending order on assets.
SELECT *
FROM branches
ORDER BY city, assets DESC;
branch
city
assets
branch
city
assets
Center City
Philadelphia
5,000,000
Cropsy Ave
Brooklyn
10,000,000
North East Philly
Philadelphia
1,000,000
Bay Parkway
Brooklyn
7,5000,000
Cropsy Ave
Brooklyn
10,000,000
Park Slope
Brooklyn
3,500,000
Bay Parkway
Brooklyn
7,500,000
Medford
Medford
1,250,000
Park Slope
Brooklyn
3,500,000
Center City
Philadelphia
5,000,000
Medford
Medford
1,250,000
North East Philly
Philadelphia
1,000,000
branches table
result set
Database Systems – SQL
SQL SELECT – INNER JOIN
Instead of joining two tables with the WHERE clause, in this class you are required join
tables with one of the JOIN clauses.
The INNER JOIN syntax is as follows:
SELECT fieldlist
FROM (table1 INNER JOIN table2
ON table1.join-field = table2.join-field);
To select all employees, city, and team from the cities and teams tables use the
following query:
SELECT cities.employee_name, cities.city, teams.employee_name, teams.team
FROM (cities INNER JOIN teams
ON cities.employee_name = teams.employee_name);
employee_name
city
employee_name
team
Jeter
New York City
Glavin
Mets
Howard
Philadelphia
Howard
Phillies
Utley
Philadelphia
Bonds
Giants
Schilling
Boston
Schilling
Choke Sox
cities table
teams table
Database Systems – SQL
SQL SELECT – INNER JOIN
To select all employees, city, and team from the cities and teams tables use the
following query:
SELECT cities.employee_name, cities.city, teams.employee_name, teams.team
FROM (cities INNER JOIN teams ON cities.employee_name = teams.employee_name);
employee_name city
employee_name
team
Jeter
New York City
Glavin
Mets
Howard
Philadelphia
Howard
Phillies
Utley
Philadelphia
Bonds
Giants
Schilling
Boston
Schilling
Choke Sox
teams table
cities table
employee_name
city
employee_name
team
Howard
Philadelphia
Howard
Phillies
Schilling
Boston
Schilling
Choke Sox
result set
The inner join omits records that do not match, so we do not have
records for Jeter, Utley, Glavin, or Bonds.
Database Systems – SQL
SQL SELECT – NATURAL INNER JOIN
When you select data from two tables joined on a field with the same name, your inner
join is called a natural inner join.
I do not recommend using the natural inner join clause, because if the table structure
changes and fields are added that then match between the tables that are joined, then
your join clause would change unintentionally.
However, since you may see the NATURAL INNER JOIN clause here it is:
SELECT fieldlist
FROM (table1 NATURAL INNER JOIN table2);
Therefore, you could select all employees, city, and team from the cities and teams
tables use the following query:
SELECT cities.employee_name, cities.city, teams.employee_name, teams.team
FROM (cities NATURAL INNER JOIN teams);
Do not use the NATURAL INNER JOIN clause on an exam. You will lose 50% for each
problem you use it.
Database Systems – SQL
SQL SELECT - LEFT OUTER JOIN
The problem with using the INNER JOIN is when you need all the records from one
table and any data that exists from the joined table.
Imagine we added a table called batting
employee_name
city
employee_name home_runs
Jeter
New York City
Jeter
325
Howard
Philadelphia
Howard
150
Utley
Philadelphia
Utley
82
Schilling
Boston
cities table
batting table
To find out the employee-name, city, and # of home runs for each employee of
Major League Baseball you might think we could use the following query:
SELECT cities.employee_name, cities.city, batting.home_runs
FROM (cities INNER JOIN batting
ON cities.employee_name = batting.employee_name);
Database Systems – SQL
SQL SELECT – LEFT OUTER JOIN
SELECT cities.employee_name, cities.city, batting.home_runs
FROM (cities INNER JOIN batting
ON cities.employee_name = batting.employee_name);
However, the result set from the query is as follows:
employee_name
city
employee_name
home_runs
Jeter
New York City
Jeter
325
Howard
Philadelphia
Howard
150
Utley
Philadelphia
Utley
82
Schilling
Boston
batting table
cities table
employee_name
city
home_runs
Jeter
New York City
325
Howard
Philadelphia
150
Utley
Philadelphia
82
result set
Notice poor bleeding-sock Schilling is missing! The same would have happened if
you used a WHERE clause to join the two tables as follows:
SELECT cities.employee_name, cities.city, batting.home_runs
FROM cities, home_runs WHERE cities.employee_name = batting.employee_name;
Database Systems – SQL
SQL SELECT - LEFT OUTER JOIN
The main reason to learn the INNER JOIN clause is because there are other forms of
JOIN clauses. Once you learn the syntax for one join, it is basically the same for all
others.
The LEFT OUTER JOIN syntax is as follows:
SELECT fieldlist
FROM (table1 LEFT OUTER JOIN table2
ON table1.join-field = table2.join-field);
Database Systems – SQL
SQL SELECT – LEFT OUTER JOIN
To select all employees and cities in the cities table as well as any teams that exist for
those employees use the following query:
SELECT cities.employee_name, cities.city, teams.employee_name, teams.team
FROM (cities LEFT OUTER JOIN teams
ON cities.employee_name = teams.employee_name);
employee_name
city
employee_name
team
Jeter
New York City
Glavin
Mets
Howard
Philadelphia
Howard
Phillies
Utley
Philadelphia
Bonds
Giants
Schilling
Boston
Schilling
Choke Sox
teams table
cities table
employee_name
city
employee_name
team
Jeter
New York City
Null
Null
Howard
Philadelphia
Howard
Phillies
Utley
Philadelphia
Null
Null
Schilling
Boston
Schilling
Choke Sox
result set
Includes all records from the left and only those records on the right that
match.
Database Systems – SQL
SQL SELECT – LEFT OUTER JOIN
A great use of the LEFT OUTER JOIN is to determine what records are in the LEFT
table, but not in the right table.
SELECT fieldlist
FROM (table1 LEFT OUTER JOIN table2
ON table1.join-field = table2.join-field)
WHERE table2.join-field IS NULL;
So to select the employees in cities that are not in the teams table use the following
query:
SELECT cities.employee_name
FROM (cities LEFT OUTER JOIN teams
ON cities.employee_name = teams.employee_name)
WHERE teams.employee_name IS NULL;
employee_name
city
employee_name
team
employee_name
Jeter
New York City
Glavin
Mets
Jeter
Howard
Philadelphia
Howard
Philadelphia
Utley
Utley
Philadelphia
Bonds
Giants
Schilling
Boston
Schilling
Choke Sox
cities table
teams table
result Set
Database Systems – SQL
SQL SELECT – RIGHT OUTER JOIN
The RIGHT OUTER JOIN is basically the mirror image of the LEFT OUTER JOIN.
The RIGHT OUTER JOIN syntax is as follows:
SELECT fieldlist
FROM (table1 RIGHT OUTER JOIN table2
ON table1.join-field = table2.join-field);
Stylistically, I rarely use it. Any RIGHT OUTER JOIN can be rewritten as a LEFT OUTER
JOIN by simply switching the order of the tables.
Therefore the previous statement could be rewritten as a LEFT OUTER JOIN as follows:
SELECT fieldlist
FROM (table2 LEFT OUTER JOIN table1
ON table1.join-field = table2.join-field);
Database Systems – SQL
SQL SELECT – RIGHT OUTER JOIN
To select all employees and teams in the teams table as well as any cities that exist for
those employees use the following query:
SELECT cities.employee_name, cities.city, teams.employee_name, teams.team
FROM (teams RIGHT OUTER JOIN cities
ON teams.employee_name = cities.employee_name);
employee_name team
employee_name city
Glavin
Mets
Howard
Phillies
Bonds
Giants
Schilling
Choke Sox
Jeter
New York City
Howard
Philadelphia
Utley
Philadelphia
Schilling
Boston
cities table
employee_name
city
employee_name
team
Null
Null
Glavin
Mets
Howard
Philadelphia
Howard
Phillies
Null
Null
Bonds
Giants
Schilling
Boston
Schilling
Choke Sox
teams relation
result set
Includes all records from the right and only those records on the left that match
Database Systems – SQL
SQL SELECT – FULL OUTER JOIN
Sometimes you want data from both sides of the join regardless if the other side
exists.
Personally, I have found this join less useful.
The FULL OUTER JOIN syntax is as follows:
SELECT fieldlist
FROM (table1 FULL OUTER JOIN table2
ON table1.join-field = table2.join-field);
employee_name
city
employee_name
team
Jeter
New York City
Glavin
Mets
Howard
Philadelphia
Howard
Phillies
Utley
Philadelphia
Bonds
Giants
Schilling
Boston
Schilling
Choke Sox
cities table
teams relation
Database Systems – SQL
SQL SELECT – FULL OUTER JOIN
To select all employees, teams and cities that exist any employees use the following
query:
SELECT cities.employee_name, cities.city, teams.employee_name, teams.team
FROM (teams FULL OUTER JOIN cities
ON teams.employee_name = cities.employee_name);
employee_name
city
team
New York City
employee_nam
e
Glavin
Jeter
Howard
Philadelphia
Howard
Phillies
Utley
Philadelphia
Bonds
Giants
Schilling
Boston
Schilling
Choke Sox
cities table
Mets
teams relation
employee_name
Jeter
city
employee_name
team
New York City
Null
Null
Utley
Philadelphia
Null
Null
Null
Null
Glavin
Mets
Howard
Philadelphia
Howard
Phillies
Null
Null
Bonds
Giants
Schilling
Boston
Schilling
Choke Sox
result set
Includes all records from the right and the left of the join. For records that do not
match, Null values replace the attributes that are missing.
Database Systems – SQL
SQL SELECT - EXPRESSIONS
It is often convenient to perform operations on a field(s) within the query itself. The
following operations exist:
Name
Description
/
Division Operator
DIV
Integer Division
-
Minus
%
Modulo Operator
+
Addition Operator
*
Multiply Operator
-
Change of Sign
Note, If one of the values in your arithmetic statement is NULL, your result will
equal NULL.
Database Systems – SQL
SQL SELECT - EXPRESSIONS
Observe how we use the + operator to compute the final price of a item.
SELECT retail-cost, tax, retail-cost + tax AS final-price
FROM sales;
Note the use of the AS operator to rename the expression to something readable and
meaningful in the context of the SQL statement.
In addition, there exists numerous mathematical functions. A few are listed here, look
the rest up.
Name
Description
ABS()
Absolute value
EXP()
Exponent
LN()
Natural logarithm
RAND()
Random number
ROUND()
Rounds the argument
SQRT()
Square root of the argument
Observe how we compute the diagonal of a TV where we have the width and
height of the set.
SELECT SQRT(width*width + height*height) AS diagonal, TV
FROM televisions;
Database Systems – SQL
SQL SELECT - STRING FUNCTIONS
There are many string functions that can be applied to character fields in a SELECT
query. Here are some of the most useful
Name
Description
LEFT
Returns the leftmost n characters of the string
LPAD
Returns the string padded from the left up to n characters
with the string provided
LOWER
Returns the string in all lowercase characters
RIGHT
Returns the rightmost n characters of the string
RPAD
Returns the string padded from the right up to n
characters with the string provided
SUBSTR
Returns the substring specified
UPPER
Returns the string in all uppercase characters
Here are a few examples:
SELECT SUBSTR(website, 1, 3) AS extension FROM websites;
SELECT RPAD(product, 20, ‘.’) FROM products;
SELECT LOWER(website) AS website FROM websites;
Download