Notes on SQL (Structured Query Language) By R. E. Wyllys @ University of Texas, Austin Background Work on what has become known as "SQL" can be traced back to the early 1970s, when Edgar F. Codd initiated the development of a formal theoretical structure for what we now know as relational databases. Codd's work on formalizing the principles of relational databases led him and others to realize that it would also be useful to develop a standard way of querying such databases. Along with Codd, such writers as Morton M. Astrahan, Raymond F. Boyce, and Donald D. Chamberlin were active in exploring ways of simplifying and standardizing the expression of queries. This exploration produced SQL, which is pronounced "sequel" and is an abbreviation of the phrase "structured query language" (the pronunciation stems from the fact that an early version of the language was named "SEQUEL"). As a proposed standard, SQL needed the blessing of the American National Standards Institute (ANSI). The standards-formulation process resulted in the publication by ANSI in 1983, 1986, and 1988 of successive drafts of a proposed standard for SQL: American National Standard for Information Systems-Database Language--SQL. ANSI X3.135. In 1989 ANSI and the International Standards Organization (ISO) of which ANSI is the member for the U.S.A., published a proposed standard for an extended version of SQL, which became known as SQL2 in 1992. At present, standards committees are working on still another version of SQL, SQL3; the target date for completion is 1998. The standards cover not only the terms used in the formal language itself, but also a host of details concerning technical definitions and usage. The present notes treat primarily the language terms. As you will see, SQL has become more than a mere query language; it has evolved into a language that is also used for defining databases and their contents. Most high-end personal computer databasemanagement systems (DBMSs) incorporate provisions for allowing their users to use SQL, although some of them require the user to call a special module into action in the DBMS in order to use SQL. All modern minicomputer and mainframe DBMSs incorporate SQL. Although SQL is widely used, it remains true that SQL by itself is not sufficient to deal with all the practical problems of managing actual files and user interfaces in any operating-system environment. Every implementation of SQL is part of a larger DBMS that is used for such needs as file maintenance, the provision of tailored data-entry and data-output (i.e., report) formats, screen-handling, printing, and network communications. Overview We begin with an overview of the basic vocabulary of SQL. The language contains commands, arithmetic expressions, keywords, logical connectives, and predicates (i.e., conditions that are evaluated to yield values of "true," "false", or "unknown"). COMMANDS /*...*/ Delimit a comment within or before a SQL command ALTER TABLE Add a column to a table or re-define a column in a table BEGIN TRAN Begin a transaction COMMIT TRAN Make permanent the change(s) resulting from a transaction CREATE INDEX Create an index on a column or columns in a table CREATE TABLE Create a table, including definition of its columns and other properties CREATE VIEW Define the logical equivalent of a table, consisting of columns selected from one or more tables (or from previously defined views); views are also called "virtual tables" DELETE FROM Delete rows from a table DROP DATABASE Delete a database DROP INDEX Delete an index DROP TABLE Delete a table DROP VIEW Delete a view GRANT Create users; assign passwords and privileges INSERT Add rows to a table or view LOAD DATABASE Prepare a database for use by the databasemanagement system as a currently accessible database LOAD TRAN Load a transaction from a specified device or source REVOKE Revoke database privileges ROLLBACK TRAN Eliminates noncommitted transactions (i.e., changes) to a database (used in case of system failure or to get rid of incorrect transactions) SAVE TRAN Save a transaction (in a separate file, awaiting the transaction's incorporation into the database) SELECT Select row(s) and/or column(s) from a table or tables UPDATE Change the value of one or more column entries in or more rows of a table UPDATE STATISTICS Update statistics on a table ARITHMETIC (VALUE) EXPRESSIONS + Addition - Subtraction * Multiplication / Division The hierarchy of arithmetic operations is: first, any expression within parentheses ( ); second, multiplication and division, performed left to right; and third, addition and subtraction, performed left to right. LOGICAL CONNECTIVES AND Logical AND (intersection) OR Logical OR (union) NOT Logical NOT PREDICATES = equal to <> not equal to < less than > greater than <= less than or equal to >= greater than or equal to ALL ANY BETWEEN...AND EXISTS, NOT EXISTS IN, NOT IN LIKE NULL SOME AGGREGATE FUNCTIONS AVG average of the values in a column COUNT count of the number of records meeting specified conditions COUNT(*) count of the number of records in a table MAX maximum of the values in a column MIN minimum of the values in a column SUM sum of the values in a column KEYWORDS (Reserved Words) All And Any As Asc Authorization AVG Begin Between By Char Character Check Close Commit Continue Count Create Current Cursor Datetime Dec Decimal Declare Delete Desc Distinct Double End Escape Exec Exists Fetch Float For Found From Go GOTO Grant Group Having In Insert Int Integer Into Is Like Max Min Module Not Null Numeric Of On Open Option Or Order Precision Privileges Procedure Public Real Rollback Schema Section Select Set Smallint Some SQL Sqlcode Sqlerror Sum Table To Union Unique Update User Values Varchar View Whenever Where With Work Commands Notation Used in Defining the Syntax of Commands The foregoing lists have given you an overview of SQL. Now we are going to look in detail at the major commands and how they are used. First, we need to consider the syntax used in providing formal definitions of the commands, a syntax that is widely used in computer and programming contexts: | A vertical bar means "or". { } Braces (curly brackets) delimiting words mean that you must choose at least one of the enclosed choices. The choices may be separated by "|" or ",". When used with braces, the "|" tells you that you must choose exactly one of the choices, and the "," tells you that you must choose one or more of the choices. [ ] Square brackets delimiting words mean that you may choose one or more of the enclosed choices. The choices may be separated by "|" or ",". When used with brackets, the "|" tells you that you may select at most one of the choices, and the "," tells you that you may select none, one, or several of them. ... Ellipses mean that you may repeat the previous item as many times as you wish. ( ) When parentheses "( )" are used in the definition of a command, this tells you that you must use the parentheses, as shown, in the command. 3.2 Definitions of Commands In what follows, commands are shown in upper- and lower-case letters. The upper-case part shows you what you must type as part of the command in SQL, although you may (in most implementations of SQL) type it in lower case. The lower-case parts of the command are what you provide specific expressions for when you type the command. ALTER TABLE table_name {ADD | MODIFY | DROP} (column_name datatype [NULL | NOT NULL] ) Ex: alter table employee_names add (emp_midinitial char(1) null) Note: In this example the specification "null" means that null values are acceptable in the emp_midinitial column. If you specify NOT NULL, then the DBMS will not allow that column to be left empty when a record is entered into the table. If neither "null" or "not null" is specified, then the default value used by the DBMS will be assigned. In the ANSI standard for SQL, the default is to accept null values in all columns that lack a specification as to null value. Most DBMSs follow this standard also in their non-SQL operations. Note: The ALTER command operates on only one column at a time; hence, if you wish, for example, to add two columns to a table, you must use two separate ALTER commands. BEGIN TRAN transaction_name Note: The BEGIN TRAN and COMMIT TRAN commands typically are used to bracket a set of SQL statements: e.g., begin tran sale_transaction SQL statement 1 SQL statement 2 commit tran sale_transaction In this example, SQL statements 1 and 2 take effect only on a temporary basis (e.g., in a buffer) till the COMMIT TRAN command places the results of statements 1 and 2 into the database. This is an important means of protecting against system crashes and for dealing with multi-part transactions that could cause damage if entered into the database in an incomplete state. COMMIT TRAN transaction_name Note: See BEGIN TRAN. CREATE DATABASE database_name Ex: create database library_catalog create database employees CREATE INDEX index_name ON table_name (column_name [, column_name]...) Ex: create index publisher_ndx on monographs (pub_name) create index name_ndx on employee_names (emp_lname, emp_fname) Note: The second example shows the creation of a composite index. CREATE TABLE table_name (column_name datatype [NULL | NOT NULL] [, column_name datatype [NULL | NOT NULL] ]...) Ex: create table employee_names (SSN char(11) not null, emp_lname char(20), emp_fname char(15), emp_bdate datetime(mm"/"dd"/"yy)) create table cataloging_staff (catlgr_fname char(15), catlgr_lname char(20), workstation_num char(3)) Note: The optional insertion of NULL or NOT NULL after each column name allows you to specify whether you want to allow null values in the column or not. The DBMS will make a default choice if you fail to specify what you want (the ANSI standard default is to allow null values). If you specify NOT NULL, then the DBMS will not allow that column to be left empty when a record is entered into the table. CREATE VIEW view_name (column_name [, column_name]...) AS select_statement [WITH CHECK OPTION] Ex: create view young_emp (fname, midinit, lname) as select emp_fname, emp_midinitial, emp_lname from employee_names where emp_bdate after (TODAY - (18*365+4)) Note: If you specify WITH CHECK OPTION, then any update operations (DELETE, INSERT, UPDATE) to the table(s) in the select_statement will be checked for compliance with the conditions of the select_statement before they are permitted to take effect in the table(s). Hence, you should use this option only if you are sure that you want it to affect all update operations on the table(s). Note: "TODAY" is not a SQL function. It is used here merely to help illustrate the CREATE VIEW command. Nevertheless, it is true that most DBMSs incorporate a function like that symbolized by "TODAY", viz., a function that returns the current date in a form that can be operated on numerically. DELETE FROM table_name WHERE select_statement Ex: delete from employee_names where SSN = "123-45-6789" delete from monographs where copyright_date before 1910 and where language = "Dutch" DROP DATABASE database_name Note: This command removes the entire database--all its tables, views, and indexes--from your DBMS. Take care not to use this command when you really want to drop only a table or some other part of a database, not the whole database. DROP INDEX index_name DROP TABLE table_name DROP VIEW view_name GRANT {ALL | privilege_list} ON {table_name [column_list] | view_name [column_list]} TO {PUBLIC | user_list} Ex: grant insert, delete on monographs to cataloger_Jerry Note: Typical items on a privilege list would be such operations as DELETE, INSERT, SELECT, and UPDATE. Naturally, the user_list may include only persons (or other agents, such as terminals or nodes on a network) who are on the DBMS's list of authorized users of the database. INSERT INTO table_name [(column_list)] VALUES (value1, value2, ...) Ex: insert into employee_names values ("987-65-4321", "Lee", "Jerry", "07/04/68") insert into employee_names (SSN, emp_fname) values ("001-23-4456, "Socrates") insert into cataloging_staff ("Jerry", "Lee", "3") Note: The first example results in the addition of to the employee_names table of a new row with the entries specified. The second example will result in the addition to the employee_names table of a row with "001-23-4456" in the first column, "Socrates" in the third column, and nulls in the second and fourth columns. INSERT INTO table_name [(insertion_column_list)] SELECT [selection_column_list] FROM table_name WHERE search_specifications Ex: insert into employee_names select from applicant_names where SSN = "135-46-9878" Note: This example adds to the employee_names table a row whose contents match those of the row in the applicant_names table for which SSN = "135-46-9878". The applicant_names table is not changed by this copying and inserting operation. For this example we assume that the applicant_names table has the same structure as the employee_names table. If this were not the case, then the insertion_column_list and the selection_column_list would have to specify the columns involved, and these columns would have to match each other in characteristics including their domains. LOAD DATABASE database_name LOAD TRAN transaction_name REVOKE {ALL | privilege_list} ON {table_name [column_list] | view_name [column_list]} FROM {PUBLIC | user_list} Ex: revoke all on monographs from cataloger_Lee Note: The REVOKE command is, in effect, the reverse of GRANT. ROLLBACK TRAN transaction_name Ex: rollback tran sale_transaction SELECT [ALL | DISTINCT] select_column_list FROM {table_name | view_name} [, {table_name | view_name}] ... [WHERE selection_condition_statement] [GROUP BY column_name [, column_name]...] [HAVING search_conditions] [ORDER BY {column_name | select_list-number} [ASC | DESC] [, {column_name | select_list_number} [ASC | DESC]...} Ex: select emp_fname, emp_lname from employees where SSN="123-45-6789" select all from cataloging_staff order by workstation_num select SSN, emp_fname, emp_lname, workstation_num from employees, cataloging_staff where SSN="987-65-4321" and where catlgr_lname = emp-lname and where catlgr_fname=emp_fname select mono_title from monographs where au_lname="Asimov"and au_fname="Isaac"group by fict_non_fict_marker order by mono_title asc UPDATE table_name SET column_name = expression [WHERE search_conditions] Ex: update vendor_list set vendor_name = "Harcourt Brace" where vendor_name = "Harcourt Brace Jovanovitch" update user_list set user_street_address = "123 Main Street" where user_lname = "Smith" and where user_street_address = "456 Central Avenue" UPDATE STATISTICS table_name Note: This command is available in some versions of SQL. Where it is available, the DBMS manual will specify a standard set of statistics that is produced by the command. Aggregate Functions Aggregate functions are used in conjunction with SQL commands (mainly the SELECT command), in a way that the examples will show. AVG([DISTINCT] expression) Ex: select avg(acquisition_cost) from January_accession_list select avg(journal_price*1.17) from journal_subscriptions where subscription_year = 1994 Note: The second example provides an estimate of the average cost of a journal in 1995, assuming that this cost would increase by 17% over 1994, in keeping with the rate of inflation of journal costs in recent years. COUNT([DISTINCT] expression) Ex: select count(distinct SSN) from chargeout_list Note: This example would provide the number of different users who charged out items from the library during the period covered by the chargeout_list table. COUNT(*) Ex: select count(*) chargeout_list Note: This example would provide the total number of rows in the chargeout_list table, i.e., the number of items charged out during the period covered by the table. MAX(expression) Ex: select max(acquisition_cost) from January_accession_list MIN(expression) Ex: select min(acquisition_cost) from January_accession_list SUM([DISTINCT] expression) Ex: select sum(acquisition_cost) from January_accession_list select sum(acquisition_cost*0.95) from January_accession_list where vendor_name = "Vendor A" Note: The second example shows how much the library would have spent in January on the items it bought from Vendor A if it had bought them from Vendor B instead of Vendor A, assuming that Vendor B's net prices are 5% less than those of Vendor A.