Database Systems – SQL SQL – STRUCTURED QUERY LANGUAGE INTRODUCTION SQL is a fairly “standard” query language for relational databases. Many databases implement SQL, but many databases implement various levels of the ANSI standards as well as adding additional functionality that may be a superset of the standard. Most of what will be taught here is standard across most mainstream databases. SQL provides the ability to: create tables insert data select data update data delete data drop tables create referential integrity Database Systems – SQL SQL – SELECT Note, SQL is not case sensitive Basic form of the SELECT statement is as follows: SELECT fieldlist FROM tablelist WHERE condition; or SELECT A1, A2, A3, … An FROM r1, r2, …, rm WHERE condition; Although not required, my style is to capitalize keywords Observe the following query which returns all website names from the websites table: SELECT website FROM websites; Database Systems – SQL SQL - SELECT SELECT website FROM websites; Returns all the website names in the table websites. The order is not guaranteed. website www.zojjed.com www.racewalk.com www.greattreks.com www.twofeetgallery.com www.walkinghealthy.com www.cs.drexel.edu/~jsalvage result set If you wish all of the fields to be returned from a query, use * instead of the field list. Therefore, to select all fields from the websites table, use the following query: SELECT * FROM websites; Database Systems – SQL SQL – SELECT To select all fields from the websites table, use the following query: SELECT * FROM websites; id_website website organization first_year category 1 www.zojjed.com Walking Promotions 2006 Fiction 2 www.racewalk.com Walking Promotions 1995 Health 3 www.greattreks.com Walking Promotions 2006 Travel 4 www.twofeetgallery.com Walking Promotions 2004 Photographs 5 www.walkinghealthy.com Walking Promotions 2002 Health 6 result set www.cs.drexel.edu/~jsalvage Drexel University 2005 Education Database Systems – SQL SQL – SELECT Write a query to return all the website names in the table customers assuming the customers table was set up similar to the customers relation when we discussed set theory. Note, the order is not guaranteed. The customer table has duplicate website names. Unlike in set theory, duplicates ARE returned from SQL by default. SELECT website FROM customers; website first-name website Derek last_nam e Jeter www.zojjed.com www.zojjed.com Chase Utley www.zojjed.com www.drexel.edu/~jsalvage Jeremy Johnson www.drexel.edu/~jsalvage www.racewalk.com Ryan Howard www.racewalk.com www.zojjed.com Ryan Howard www.zojjed.com customers table www.zojjed.com result set Database Systems – SQL SQL – SELECT – DISTINCT BY adding the DISTINCT keyword, only unique rows return from the SELECT query. This is similar to the behavior of set theory, but slows performance. The following query returns all the UNIQUE website names in the table customers. The order is not guaranteed. SELECT DISTINCT website FROM customers; website first_name website Derek last_nam e Jeter www.zojjed.com www.zojjed.com Chase Utley www.drexel.edu/~jsalvage www.drexel.edu/~jsalvage Jeremy Johnson www.racewalk.com www.racewalk.com Ryan Howard www.zojjed.com Ryan Howard customers table www.zojjed.com result set Database Systems – SQL SQL SELECT – MULTI TABLE The FROM clause defines the Cartesian product of the relations. So you can link two tables by listing them together (separated by a comma) like this: SELECT websites.website, websites.category, sales.first_name, sales.last_name FROM websites, sales WHERE websites.website = sales.website; This is the a form of the natural join. However, we are not allowed to use it! website category first_name last_name www.zojjed.com Fiction Derek Jeter www.zojjed.com Fiction Chase Utley www.zojjed.com Fiction Ryan Howard www.racewalk.com Health Ryan Howard www.cs.drexel.edu/~jsalvage Education Jeremy Johnson I will show you a better style. If you use this style on the exam, you will lose 50% of the points of a problem for each time you use it. Database Systems – SQL SQL SELECT – WHERE CLAUSE You can add additional constraints by adding clauses to the WHERE clause. So you can link to tables like this: SELECT websites.website, websites.category, sales.first_name, sales.last_name FROM websites, sales WHERE websites.website = sales.website AND websites.category = ‘Fiction’; website category first_name last_name www.zojjed.com Fiction Derek Jeter www.zojjed.com Fiction Chase Utley www.zojjed.com Fiction Ryan Howard Database Systems – SQL SQL SELECT – WHERE CLAUSE Do you think there is a difference in the result set between the following two SQL statements? SELECT websites.website, websites.category, sales.first_name, sales.last_name FROM websites, sales WHERE websites.website = sales.website AND websites.website =‘www.racewalk.com’; and SELECT websites.website, websites.category, sales.first_name, sales.last_name FROM websites, sales WHERE websites.website = sales.website AND sales.website = ‘www.racewalk.com’; Note, the typographical difference between the two statements is the table specifying the comparison for the website name. Database Systems – SQL SQL SELECT – WHERE CLAUSE Do you think there is a difference in the result set between the following two SQL statements? SELECT websites.website, websites.category, sales.first_name, sales.last_name FROM websites, sales WHERE websites.website = sales.website AND websites.website =‘www.racewalk.com’; and SELECT websites.website, websites.category, sales.first_name, sales.last_name FROM websites, sales WHERE websites.website = sales.website and sales.website = ‘www.racewalk.com’; Note, the typographical difference between the two statements is the table specifying the comparison for the website name. There is no difference between the result sets because only records where the website matches in both tables are included. Therefore comparing either to a scalar value will return in the same result set. Database Systems – SQL SQL SELECT - ALIASES Sometimes typing out the entire tablename on each column is burdensome. When you do not select columns that exist in both tables you do not have to list the table qualifier. However, when the column exists in both tables then you must list it. A short cut to listing the table name is to locally rename the table name within the query using the AS keyword. Observe the previous query rewritten with the AS keyword SELECT w.website, w.category, s.first_name, s.last_name FROM websites AS w, sales AS s WHERE w.website = s.website and w.website = ‘www.racewalk.com’; In addition, we can further simplify this by removing the currently unnecessary table specifications as follows: SELECT w.website, category, first_name, last_name FROM websites AS w, sales AS s WHERE w.website = s.website and w.website = ‘www.racewalk.com’; However, this is bad form. Why? Database Systems – SQL SQL SELECT - ALIASES Why is it bad to leave a column without a table specification even if it is the only column with that specification? The problem is when tables get modified in the future. If a table is modified and the column is added, then previously working SQL statements can fail. Therefore, always fully specify your columns with the table names they are selected from. Database Systems – SQL SQL SELECT - NULLs When we discussed set theory we said that NULLs can exist, but we were going to ignore them. In databases ignoring NULLs can create problems. When we design out databases, key fields should not contain NULLs. We will discuss this more later. For know, if you want to check for NULL use the IS NULL or IS NOT NULL comparison of the WHERE clause The following query returns the website and category of all websites that has a NULL value for an organization: SELECT website, category FROM websites WHERE organization IS NULL; The following query returns the website and category of all websites that has a nonNULL value for an organization: SELECT website, category FROM websites WHERE organization IS NOT NULL; Database Systems – SQL SQL SELECT - ALIASES The AS keyword, aka rename, is also used in the same manner as in set theory. This means we can select from two representations of the same table within a single SQL SELECT statement. One of the issues with SQL is not that the language is difficult to understand, but sometimes it is difficult to get it to do what you want it to. Find all names of all branches that have branch assets greater than at least one branch located in Brooklyn. branch city assets Center City Philadelphia 5,000,000 North East Philly Philadelphia 1,000,000 Cropsy Ave Brooklyn 10,000,000 Bay Parkway Brooklyn 7,500,000 Park Slope Brooklyn 3,500,000 Medford Medford 1,250,000 branches table Database Systems – SQL SQL SELECT – ALIASES Find all names of all branches that have branch assets greater than at least one branch located in Brooklyn. SELECT distinct T.branch FROM branch AS T, branch AS S WHERE T.assets > S.assets AND S.city = ‘Brooklyn’; branch-name city assets branch-name city assets Center City Philadelphia 5,000,000 Cropsy Ave Brooklyn 10,000,000 North East Philly Philadelphia 1,000,000 Bay Parkway Brooklyn 7,5000,000 Cropsy Ave Brooklyn 10,000,000 Bay Parkway Brooklyn 7,500,000 Park Slope Brooklyn 3,500,000 Medford Medford 1,250,000 branches table Center City result set Philadelphia 5,000,000 Database Systems – SQL The WHERE clause of the SQL statement has many options. Sometimes we do not wish to match a string exactly, instead we use a wildcard character. % - the character matches a substring In addition, instead of using the = operator, replace = with LIKE Therefore, if you wish to select only those branches that start with the letter ‘C’ from the branches table, you would use the following query: SELECT branch FROM branches WHERE branch LIKE “C%”; branch city assets branch Center City Philadelphia 5,000,000 Center City North East Philly Philadelphia 1,000,000 Cropsy Ave Cropsy Ave Brooklyn 10,000,000 Bay Parkway Brooklyn 7,500,000 Park Slope Brooklyn 3,500,000 Medford Medford 1,250,000 branches table result set Database Systems – SQL SQL SELECT - WILDCARDS The % wildcard character can also be used to search for strings that contain a certain value. To search for a specific value, place the % symbol before and after the value you wish to search for. The following query returns all branch-names that contain the string “Park” SELECT branch FROM branches WHERE branch LIKE “%Park%”; branch city assets branch city assets Center City Philadelphia 5,000,000 Bay Parkway Brooklyn 7,500,000 North East Philly Philadelphia 1,000,000 Park Slope Brooklyn 3,500,000 Cropsy Ave Brooklyn 10,000,000 Bay Parkway Brooklyn 7,500,000 Park Slope Brooklyn 3,500,000 Medford Medford 1,250,000 branches table result set Database Systems – SQL SQL SELECT - WILDCARDS The % wildcard character can also be used to look for special characters like the percent symbol itself, the blackslash, or the double quote. To use a special symbol, place a backslash before the symbol you wish to search for as in the following examples: like ‘ab\%cd%’ matches all strings starting with ab%cd like ‘ab\\cd%’ matches all strings starting with ab\cd In addition, you can use the underscore character to match a specific number of characters. For example: _ - the character matches a single character _ _ _ matches three characters _ _ _ % any string of at least three characters _ _ _.racewalk.com matches any address containing .racewalk.com with 3 characters leading up to it. Database Systems – SQL SQL SELECT – ORDER BY Often we wish to sort the results. This is easily accomplished adding the ORDER BY clause to a SELECT statement. While some dialects of SQL allow the ORDER BY clause to be placed in multiple places within the SQL statement, always place it after the WHERE clause to be sure. The general form of the SQL SELECT with an ORDER BY clause is as follows: SELECT field list FROM table list WHERE predicate ORDER BY field list; The default ordering is ascending. SELECT * FROM branches WHERE city = “Brooklyn” ORDER BY branch; branch city assets branch city assets Center City Philadelphia 5,000,000 Bay Parkway Brooklyn 7,500,000 North East Philly Philadelphia 1,000,000 Cropsy Ave Brooklyn 10,000,000 Cropsy Ave Brooklyn 10,000,000 Park Slope Brooklyn 3,500,000 Bay Parkway Brooklyn 7,500,000 Park Slope Brooklyn 3,500,000 Medford Medford 1,250,000 branches table result set Database Systems – SQL SQL SELECT – ORDER BY You can also order more than one field. If you want to order the results by city and then assets, you can use the following query SELECT * FROM branches ORDER BY city, assets; branch city assets branch city assets Center City Philadelphia 5,000,000 Park Slope Brooklyn 3,500,000 North East Philly Philadelphia 1,000,000 Bay Parkway Brooklyn 7,5000,000 Cropsy Ave Brooklyn 10,000,000 Cropsy Ave Brooklyn 10,000,000 Bay Parkway Brooklyn 7,500,000 Medford Medford 1,250,000 Park Slope Brooklyn 3,500,000 North East Philly Philadelphia 1,000,000 Medford Medford 1,250,000 Center City Philadelphia 5,000,000 branches table result set Database Systems – SQL SQL SELECT – ORDER BY If you wish to order the results in descending order, add DESC in front of the attributes you wish to sort in descending order. The following query returns all rows from the branches table sorted in ascending order of city, but descending order on assets. SELECT * FROM branches ORDER BY city, assets DESC; branch city assets branch city assets Center City Philadelphia 5,000,000 Cropsy Ave Brooklyn 10,000,000 North East Philly Philadelphia 1,000,000 Bay Parkway Brooklyn 7,5000,000 Cropsy Ave Brooklyn 10,000,000 Park Slope Brooklyn 3,500,000 Bay Parkway Brooklyn 7,500,000 Medford Medford 1,250,000 Park Slope Brooklyn 3,500,000 Center City Philadelphia 5,000,000 Medford Medford 1,250,000 North East Philly Philadelphia 1,000,000 branches table result set Database Systems – SQL SQL SELECT – INNER JOIN Instead of joining two tables with the WHERE clause, in this class you are required join tables with one of the JOIN clauses. The INNER JOIN syntax is as follows: SELECT fieldlist FROM (table1 INNER JOIN table2 ON table1.join-field = table2.join-field); To select all employees, city, and team from the cities and teams tables use the following query: SELECT cities.employee_name, cities.city, teams.employee_name, teams.team FROM (cities INNER JOIN teams ON cities.employee_name = teams.employee_name); employee_name city employee_name team Jeter New York City Glavin Mets Howard Philadelphia Howard Phillies Utley Philadelphia Bonds Giants Schilling Boston Schilling Choke Sox cities table teams table Database Systems – SQL SQL SELECT – INNER JOIN To select all employees, city, and team from the cities and teams tables use the following query: SELECT cities.employee_name, cities.city, teams.employee_name, teams.team FROM (cities INNER JOIN teams ON cities.employee_name = teams.employee_name); employee_name city employee_name team Jeter New York City Glavin Mets Howard Philadelphia Howard Phillies Utley Philadelphia Bonds Giants Schilling Boston Schilling Choke Sox teams table cities table employee_name city employee_name team Howard Philadelphia Howard Phillies Schilling Boston Schilling Choke Sox result set The inner join omits records that do not match, so we do not have records for Jeter, Utley, Glavin, or Bonds. Database Systems – SQL SQL SELECT – NATURAL INNER JOIN When you select data from two tables joined on a field with the same name, your inner join is called a natural inner join. I do not recommend using the natural inner join clause, because if the table structure changes and fields are added that then match between the tables that are joined, then your join clause would change unintentionally. However, since you may see the NATURAL INNER JOIN clause here it is: SELECT fieldlist FROM (table1 NATURAL INNER JOIN table2); Therefore, you could select all employees, city, and team from the cities and teams tables use the following query: SELECT cities.employee_name, cities.city, teams.employee_name, teams.team FROM (cities NATURAL INNER JOIN teams); Do not use the NATURAL INNER JOIN clause on an exam. You will lose 50% for each problem you use it. Database Systems – SQL SQL SELECT - LEFT OUTER JOIN The problem with using the INNER JOIN is when you need all the records from one table and any data that exists from the joined table. Imagine we added a table called batting employee_name city employee_name home_runs Jeter New York City Jeter 325 Howard Philadelphia Howard 150 Utley Philadelphia Utley 82 Schilling Boston cities table batting table To find out the employee-name, city, and # of home runs for each employee of Major League Baseball you might think we could use the following query: SELECT cities.employee_name, cities.city, batting.home_runs FROM (cities INNER JOIN batting ON cities.employee_name = batting.employee_name); Database Systems – SQL SQL SELECT – LEFT OUTER JOIN SELECT cities.employee_name, cities.city, batting.home_runs FROM (cities INNER JOIN batting ON cities.employee_name = batting.employee_name); However, the result set from the query is as follows: employee_name city employee_name home_runs Jeter New York City Jeter 325 Howard Philadelphia Howard 150 Utley Philadelphia Utley 82 Schilling Boston batting table cities table employee_name city home_runs Jeter New York City 325 Howard Philadelphia 150 Utley Philadelphia 82 result set Notice poor bleeding-sock Schilling is missing! The same would have happened if you used a WHERE clause to join the two tables as follows: SELECT cities.employee_name, cities.city, batting.home_runs FROM cities, home_runs WHERE cities.employee_name = batting.employee_name; Database Systems – SQL SQL SELECT - LEFT OUTER JOIN The main reason to learn the INNER JOIN clause is because there are other forms of JOIN clauses. Once you learn the syntax for one join, it is basically the same for all others. The LEFT OUTER JOIN syntax is as follows: SELECT fieldlist FROM (table1 LEFT OUTER JOIN table2 ON table1.join-field = table2.join-field); Database Systems – SQL SQL SELECT – LEFT OUTER JOIN To select all employees and cities in the cities table as well as any teams that exist for those employees use the following query: SELECT cities.employee_name, cities.city, teams.employee_name, teams.team FROM (cities LEFT OUTER JOIN teams ON cities.employee_name = teams.employee_name); employee_name city employee_name team Jeter New York City Glavin Mets Howard Philadelphia Howard Phillies Utley Philadelphia Bonds Giants Schilling Boston Schilling Choke Sox teams table cities table employee_name city employee_name team Jeter New York City Null Null Howard Philadelphia Howard Phillies Utley Philadelphia Null Null Schilling Boston Schilling Choke Sox result set Includes all records from the left and only those records on the right that match. Database Systems – SQL SQL SELECT – LEFT OUTER JOIN A great use of the LEFT OUTER JOIN is to determine what records are in the LEFT table, but not in the right table. SELECT fieldlist FROM (table1 LEFT OUTER JOIN table2 ON table1.join-field = table2.join-field) WHERE table2.join-field IS NULL; So to select the employees in cities that are not in the teams table use the following query: SELECT cities.employee_name FROM (cities LEFT OUTER JOIN teams ON cities.employee_name = teams.employee_name) WHERE teams.employee_name IS NULL; employee_name city employee_name team employee_name Jeter New York City Glavin Mets Jeter Howard Philadelphia Howard Philadelphia Utley Utley Philadelphia Bonds Giants Schilling Boston Schilling Choke Sox cities table teams table result Set Database Systems – SQL SQL SELECT – RIGHT OUTER JOIN The RIGHT OUTER JOIN is basically the mirror image of the LEFT OUTER JOIN. The RIGHT OUTER JOIN syntax is as follows: SELECT fieldlist FROM (table1 RIGHT OUTER JOIN table2 ON table1.join-field = table2.join-field); Stylistically, I rarely use it. Any RIGHT OUTER JOIN can be rewritten as a LEFT OUTER JOIN by simply switching the order of the tables. Therefore the previous statement could be rewritten as a LEFT OUTER JOIN as follows: SELECT fieldlist FROM (table2 LEFT OUTER JOIN table1 ON table1.join-field = table2.join-field); Database Systems – SQL SQL SELECT – RIGHT OUTER JOIN To select all employees and teams in the teams table as well as any cities that exist for those employees use the following query: SELECT cities.employee_name, cities.city, teams.employee_name, teams.team FROM (teams RIGHT OUTER JOIN cities ON teams.employee_name = cities.employee_name); employee_name team employee_name city Glavin Mets Howard Phillies Bonds Giants Schilling Choke Sox Jeter New York City Howard Philadelphia Utley Philadelphia Schilling Boston cities table employee_name city employee_name team Null Null Glavin Mets Howard Philadelphia Howard Phillies Null Null Bonds Giants Schilling Boston Schilling Choke Sox teams relation result set Includes all records from the right and only those records on the left that match Database Systems – SQL SQL SELECT – FULL OUTER JOIN Sometimes you want data from both sides of the join regardless if the other side exists. Personally, I have found this join less useful. The FULL OUTER JOIN syntax is as follows: SELECT fieldlist FROM (table1 FULL OUTER JOIN table2 ON table1.join-field = table2.join-field); employee_name city employee_name team Jeter New York City Glavin Mets Howard Philadelphia Howard Phillies Utley Philadelphia Bonds Giants Schilling Boston Schilling Choke Sox cities table teams relation Database Systems – SQL SQL SELECT – FULL OUTER JOIN To select all employees, teams and cities that exist any employees use the following query: SELECT cities.employee_name, cities.city, teams.employee_name, teams.team FROM (teams FULL OUTER JOIN cities ON teams.employee_name = cities.employee_name); employee_name city team New York City employee_nam e Glavin Jeter Howard Philadelphia Howard Phillies Utley Philadelphia Bonds Giants Schilling Boston Schilling Choke Sox cities table Mets teams relation employee_name Jeter city employee_name team New York City Null Null Utley Philadelphia Null Null Null Null Glavin Mets Howard Philadelphia Howard Phillies Null Null Bonds Giants Schilling Boston Schilling Choke Sox result set Includes all records from the right and the left of the join. For records that do not match, Null values replace the attributes that are missing. Database Systems – SQL SQL SELECT - EXPRESSIONS It is often convenient to perform operations on a field(s) within the query itself. The following operations exist: Name Description / Division Operator DIV Integer Division - Minus % Modulo Operator + Addition Operator * Multiply Operator - Change of Sign Note, If one of the values in your arithmetic statement is NULL, your result will equal NULL. Database Systems – SQL SQL SELECT - EXPRESSIONS Observe how we use the + operator to compute the final price of a item. SELECT retail-cost, tax, retail-cost + tax AS final-price FROM sales; Note the use of the AS operator to rename the expression to something readable and meaningful in the context of the SQL statement. In addition, there exists numerous mathematical functions. A few are listed here, look the rest up. Name Description ABS() Absolute value EXP() Exponent LN() Natural logarithm RAND() Random number ROUND() Rounds the argument SQRT() Square root of the argument Observe how we compute the diagonal of a TV where we have the width and height of the set. SELECT SQRT(width*width + height*height) AS diagonal, TV FROM televisions; Database Systems – SQL SQL SELECT - STRING FUNCTIONS There are many string functions that can be applied to character fields in a SELECT query. Here are some of the most useful Name Description LEFT Returns the leftmost n characters of the string LPAD Returns the string padded from the left up to n characters with the string provided LOWER Returns the string in all lowercase characters RIGHT Returns the rightmost n characters of the string RPAD Returns the string padded from the right up to n characters with the string provided SUBSTR Returns the substring specified UPPER Returns the string in all uppercase characters Here are a few examples: SELECT SUBSTR(website, 1, 3) AS extension FROM websites; SELECT RPAD(product, 20, ‘.’) FROM products; SELECT LOWER(website) AS website FROM websites;