Introduction to Relational Databases and SQL Brian Panulla Web 2004 Pre-Conference Administrivia Objectives Agenda Prerequisites Schedule Logistics Presenter Introduction Attendee Introductions Objectives The primary objective of this session is to teach the basics of working with common Relational Databases, including data modeling and database design techniques. Participants will also receive an introduction to query building using Standard or Structured Query Language (SQL). Objectives To that end, upon completion of the session you should be able to: Understand database concepts and terminology Design and create tables Use SQL to retrieve and analyze information Use SQL to enter and manipulate data Agenda Lesson 1: Database Concepts Lesson 2: Entity-Relationship Modeling Lesson 3: Creating Tables Lesson 4: Using SELECT queries Lesson 5: Using INSERT queries Lesson 6: Using UPDATE queries Lesson 7: Using Delete queries Lesson 8: Database Project Prerequisites General experience in Web site development or management General experience in a scripting language (ColdFusion, PHP, ASP) or full programming language (C/C++, Java) Schedule (Half-Day Session) Start 1:00 PM Break (approximate) 3:00 PM Adjourn 5:00 pm Logistics • Restrooms • Drinking fountains, refreshments, snacks • Laptops • Messages/phones • Security • Emergency measures Presenter Brian Panulla B.S. Science, PSU (2000) Chief Information Officer Campus Data Group, LLC E-mail: bpanulla@psu.edu Attendees Your name Organization name Current position Background in databases Expectations Questions?!?! Ask away! Lesson 1: Database Concepts Lesson 1 Objectives A. B. Discuss database concepts and terminology Learn database design principles Database Terminology Database Table (or relation, entity) Row (or record, tuple) Column (or field, attribute) Data value What is a database? Common databases Database tasks: Retrieving Sorting Summarizing Inserting Updating Deleting Common DB software What is a table? Tables are the fundamental component of any relational database. A table stores the data corresponding to a specific type of object For example: A Students table would store student information An Orders table would store customer order information What is a table? Tables are made up of rows and columns The columns of a table describe the characteristics of the information stored in that table The data in each row belongs to a given instance of the type of data stored in a particular table What is a table? Each row contains one data value per column. The range of values that can go into a particular column of a row is called the domain of that column, and is generally restricted to data of a specific type (integers, character data, dates, etc.) Sample Database Design Name Office Address Locality Phone Department Bob Walker 212 S. Allen Street State College, PA 16801 (814) 555-1111 Sales Beth Adams 5251 Electric Avenue Lewistown, PA 17044 (814) 555-0165 Engineering Maggie Taylor 227 S. Allen Street State College, PA 16803 (814) 555-0771 Marketing Matt Peterson 212 S. Allen Street State College, PA 16803 (814) 555-1111 Sales What is a “Relational Database?” The term “relational database” comes from the mathematical definition of a relation, or set. All objects in a relation must have the same properties or characteristics The point? Tables group similar data or objects, with use one table per set of objects Q: Is Excel a relational database system? Database Design An art unto itself, database design skills are crucial in the development of efficient, stable Web applications Planning your database design should generally be one of the first tasks in a development project. Too often, however, the database is developed “as we go along,” creating problem after problem Lesson 2: Entity-Relationship Modeling Entity – Relationship Modeling Entity-Relationship (or E-R) Modeling is the name given to one particular method of relational database design The data to be stored in a database is categorized into entities. An entity is an example of one type of object to be modeled by the database. Entity names are typically nouns. Some Basic Entities Student Course Professor Exercise Discuss some of the entities that would exist in databases for: the participants in a youth sporting league (baseball, softball, soccer, etc.) the inventory of an independent gift shop or other store E-R Modeling: Attributes Each entity has one or more attributes that further describe that entity or relationship. An Entity with Attributes Name Phone ID Students Email GPA Exercise Discuss some of the attributes that would describe the following entities: the Teams entity in the youth sporting league database the Products entity in the gift shop inventory database E-R Modeling: Relationships A given entity can be correlated with other entities by way of relationships A relationship is typically named with a verb phrase: A Person is a Student A Book is published by a Publisher There may also be attributes that truly belong to the relationship and not to an entity Relationships Name Phone ID Professor Email GPA Student Year Teaches Date Enrolls In Grade Course Exercise Devise the relationships that connect the entities in the databases discussed in one of the previous exercises: The youth sporting league database The gift shop inventory database Lesson 3: Creating Tables Lesson 3 Objectives A. B. C. Convert an E-R Diagram to a relational schema Identify improvements to the design tables Use the CREATE TABLE SQL statement to create tables in a database Creating Tables With an E-R Model in hand, the actual creation of a database is very straightforward Each entity will generally become a table Any relationships with one or more attributes will also become a table Relationships without attributes may sometimes be modeled as a table Ground Rules for Relational Database Design 1) 2) 3) 4) No multi-part or multi-value fields Eliminate redundant information Avoid designs that call for the addition of columns to store new data Avoid “anomalies”: a) b) c) Update Delete Insert Sample Database Design Name Office Address Locality Phone Department Bob Walker 212 S. Allen Street State College, PA 16801 (814) 555-1111 Sales Beth Adams 5251 Electric Avenue Lewistown, PA 17044 (814) 555-0165 Engineering Maggie Taylor 227 S. Allen Street State College, PA 16803 (814) 555-0771 Marketing Matt Peterson 212 S. Allen Street State College, PA 16803 (814) 555-1111 Sales Sample Database Design 1) No multi-part or multi-value fields Name Office Address Locality Phone Department Bob Walker 212 S. Allen Street State College, PA 16801 (814) 555-1111 Sales Beth Adams 5251 Electric Avenue Lewistown, PA 17044 (814) 555-0165 Engineering Maggie Taylor 227 S. Allen Street State College, PA 16803 (814) 555-0771 Marketing Matt Peterson 212 S. Allen Street State College, PA 16803 (814) 555-1111 Sales Sample Database Design 2) Eliminate redundant information Name Office Address Locality Phone Department Bob Walker 212 S. Allen Street State College, PA 16801 (814) 555-1111 Sales Beth Adams 5251 Electric Avenue Lewistown, PA 17044 (814) 555-0165 Engineering Maggie Taylor 227 S. Allen Street State College, PA 16803 (814) 555-0771 Marketing Matt Peterson 212 S. Allen Street State College, PA 16803 (814) 555-1111 Sales Sample Database Design 3) Avoid designs that call for the addition of columns to store new data Name Office Address Locality Phone Department 1 Bob Walker 212 S. Allen Street State College, PA 16801 (814) 555-1111 Sales Beth Adams 5251 Electric Avenue Lewistown, PA 17044 (814) 555-0165 Engineering Maggie Taylor 227 S. Allen Street State College, PA 16803 (814) 555-0771 Marketing Matt Peterson 212 S. Allen Street State College, PA 16803 (814) 555-1111 Sales Department 2 Marketing Sample Database Design 4) Avoid anomalies: Update Anomaly Name Office Address Locality Phone Department Bob Walker 212 S. Allen Street State College, PA 16801 (814) 555-1111 Sales Beth Adams 5251 Electric Avenue Lewistown, PA 17044 (814) 555-0165 Engineering Maggie Taylor 227 S. Allen Street State College, PA 16803 (814) 555-0771 Marketing Matt Peterson 212 S. Allen Street State College, PA 16803 (814) 555-1111 Sales Sample Database Design Delete Anomaly Name Office Address Locality Phone Department Bob Walker 212 S. Allen Street State College, PA 16801 (814) 555-1111 Sales Beth Adams 5251 Electric Avenue Lewistown, PA 17044 (814) 555-0165 Engineering Maggie Taylor 227 S. Allen Street State College, PA 16803 (814) 555-0771 Marketing Matt Peterson 212 S. Allen Street State College, PA 16803 (814) 555-1111 Sales Sample Database Design Insert Anomaly Name Office Address Locality Phone Department Bob Walker 212 S. Allen Street State College, PA 16801 (814) 555-1111 Sales Beth Adams 5251 Electric Avenue Lewistown, PA 17044 (814) 555-0165 Engineering Maggie Taylor 227 S. Allen Street State College, PA 16803 (814) 555-0771 Marketing Matt Peterson 212 S. Allen Street State College, PA 16803 (814) 555-1111 Sales Manufacturing Keys A key is a field or set of fields that can be used to uniquely identify a particular record in a database table. Examples: Name, Department SSN A Primary Key is a field or set of fields chosen by the database designer to be used to define relationships between tables Primary Keys Primary Key fields: Must contain unique, non-duplicated values (or sets of values in the case of multi-field keys Cannot be NULL In the event that no one column in a table is an appropriate Primary Key, an artificial primary key column is usually generated Foreign Keys When the values of a Primary Key column in a table are shared by a common column in a child table, that column is called the Foreign Key of the relationship. Foreign Key fields must have the same data type and size as their related Primary Key field. Foreign Keys Foreign Keys are used in other tables in order to maintain relationships between data values. By allowing us to split data into multiple related tables, foreign keys can help eliminate the anomalies described previously A Better Design EID Last Name 1 2 3 4 Walker Adams Taylor Peterson OID EID First Name OID Bob Beth Maggie Matt Office Address 1 2 3 1 Locality DID 1 1 2 2 3 3 4 1 1 3 State Zip DID Department 1 2 3 4 Sales Engineering Marketing Manufacturing Phone 1 212 S. Allen Street State College PA 16801 (814) 555-1111 2 5251 Electric Avenue Lewistown PA 17044 (814) 555-0165 3 227 S. Allen Street State College PA 16801 (814) 555-0771 The Relational Schema Employees Offices EID OID OID Address FirstName Locality Lastname State Zip Phone Departments DeptAssignments DID EID DeptName DID Exercise Develop a relational schema for one of the databases from the previous lesson An Introduction to Relationships Cardinality One-to-Many Relationships One-to-One Relationships Many-to-Many Relationships Cardinality Cardinality is the term used to describe the character of a relationship Three types of cardinality exist: One to Many Many to Many One to One One-to-Many Relationships EID Last Name 1 2 3 4 Walker Adams Taylor Peterson First Name OID Bob Beth Maggie Matt OID 1 2 3 1 Office Address Each row in Employees relates to only one row in Offices, while each row in Offices may relate to many rows in Employees Locality State Zip Phone 1 212 S. Allen Street State College PA 16801 (814) 555-1111 2 5251 Electric Avenue Lewistown PA 17044 (814) 555-0165 3 227 S. Allen Street State College PA 16801 (814) 555-0771 Many-to-Many Relationships EID Last Name 1 2 3 4 Walker Adams Taylor Peterson First Name OID Bob Beth Maggie Matt 1 2 3 1 DID Department 1 2 3 4 Sales Engineering Marketing Manufacturing Each row in Employees may relates many rows in Departments, and vice versa. Many-to-Many Relationships EID Last Name 1 2 3 4 Walker Adams Taylor Peterson First Name OID Bob Beth Maggie Matt 1 2 3 1 EID DID 1 1 2 2 3 3 4 1 1 3 DID Department 1 2 3 4 Sales Engineering Marketing Manufacturing A Many-to-Many Relationship can usually be modeled by adding an additional “join table,” creating two One-to-Many Relationships instead. One-to-One Relationships EID Last Name 1 2 3 4 Walker Adams Taylor Peterson First Name OID Bob Beth Maggie Matt 1 2 3 1 EID 1 2 3 4 Salary $35,200 $75,000 $45,000 $22,000 A One-to-One Relationship is usually created for security and/or performance reasons. Technically, the two tables involved in a Oneto-One relationship could be combined into one table with no loss of data Exercise Describe the cardinality of the relationships in your relational schema Data Types Each column of a table must have a specified data type supported by the database software. The database will enforce the requirement that data entered into a field must comply with the field’s data type Choose the smallest data type appropriate for each column, leaving room for future expansion (No Y2K-like bugs!) Data Types • INTEGER • FLOAT (floating point numbers) • NUMERIC (real numbers) • • • • • CHAR VARCHAR TEXT TIMESTAMP DATE Creating Tables Choose good field and table names No spaces or special characters (other than the underscore “_” ) Use meaningful names The Leszynski Naming Convention (or LNC) is a scheme for helping you create more meaningful table and field names. LNC Naming Originally suggested by Stan Leszynski and Greg Reddick in a paper entitled "The Leszinski/Reddick Guidelines for Access 1.x, 2.x" it has become the convention used by many developers. LNC Naming This naming convention suggests that you precede object names with three letter "tags" to help developers work with unfamiliar database objects. LNC Prefixes Data Type: • n (integer or numeric) • f (float) • c (char) • vc (varchar) • dt (datetime or timestamp Role: • pk (primary key) • fk (foreign key) and: • tbl (table) • vw (view) • sp (stored procedure) A Sample Database tblAccounts npkAccountID nfkInstitutionID dtCreated SERIAL INTEGER TIMESTAMP not null not null not null [0,n] npkAccountID = nfkAccountID npkInstitutionID = nfkInstitutionID [0,n] npkUserID nfkAccountID vcUsername vcPassword vcNameFirst vcNameLast tblUsers SERIAL INTEGER VARCHAR(24) VARCHAR(32) VARCHAR(24) VARCHAR(24) not null not null not null not null null null tblInstitutions npkInstitutionID SERIAL vcFullName VARCHAR(255) vcSortName VARCHAR(255) vcCity VARCHAR(255) cfkState CHAR(2) not null not null not null null not null CREATE TABLE You can build a table in a SQL-compatible database with the CREATE TABLE statement create table Offices ( OfficeID integer primary key, Address varchar(32), Locality varchar(32), State char(2), Zip varchar(10), Phone varchar(12) ) NULL Values Relational databases have a special value, called the NULL, that can be used as a placeholder when information is unknown, missing, or to be filled in later. NULL values are different than Zero Empty Strings A column of any data type can be NULL, though PRIMARY KEY columns can never be NULL Default values Each column in a table may have a default value specified. The database will set the column to that value if no value is passed during insert If no default value is specified, the column will be set to NULL if possible. CREATE TABLE with defaults Our CREATE TABLE from the previous step with NULL preferences and defaults: create table Offices ( OfficeID integer NOT NULL primary key, Address varchar(32) NULL, Locality varchar(32) NULL DEFAULT ‘State College’, State char(2) NOT NULL DEFAULT ‘PA’, Zip varchar(10) NULL, Phone varchar(12) NULL ) Lesson 3 Review Why is the planning step important when creating a database? Why is separating data into multiple tables a good idea? Exercise Practice making tables in your database by building on the schema you developed in previous exercises. Lesson 4: SQL: Retrieving Data SQL SQL (Structured or Standard Query Language) is defined by several ANSI standards Despite being “standardized,” every DB platform may implement some of the language syntax differently SQL: Select A SELECT query is the most common type of SQL query: SELECT * FROM employees This query will retrieve all the columns of all of the rows from the table employees. SQL: Select To only retrieve only some of the columns of a table, list them after SELECT: SELECT empID, lastname, firstname FROM employees SQL: Select A WHERE clause may be added to selectively retrieve only those rows in a table that match a given constraint: SELECT * FROM employees WHERE lastname = ‘smith’ Only employee rows matching the last name of “smith” will be returned. Normal equality or inequality operators (< > =) may be used. SQL: Select Multiple conditions may be added to the WHERE clause, using AND, OR, and NOT as operators and parentheses as appropriate: SELECT * FROM employees WHERE department=‘sales’ AND (lastname = ‘smith’ OR lastname=‘jones’) Exercise Using a WHERE clause SQL: Select Add an ORDER BY clause to sort the records: SELECT empID, lastname, firstname FROM employees WHERE department = “sales” ORDER BY lastname The keyword DESC can be used to order from highest to lowest values Exercise Sorting a Query Joins Table joins alIow the database developer to combine information from related tables into a single, coherent record set and return those records to the program While the standard type of join, the inner join, can often provide the information needed, additional join types are available to control how the database combines the data in two tables Inner Joins Inner Joins are the typical join types encountered in SQL. In a two table inner join, fields from records from the first table that can be correlated to records in the second table appear “side-by-side” with fields from the second table in the resulting record set Join Types ID 1 2 3 4 Name Brian Karen Mary Dave DeptID 1 2 3 4 Dept 1 2 3 1 DeptName Engineering Education Marketing Sales Diagramming Joins Cartesian Product ID 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 Name Brian Brian Brian Brian Karen Karen Karen Karen Mary Mary Mary Mary Dave Dave Dave Dave Dept 1 1 1 1 2 2 2 2 3 3 3 3 1 1 1 1 DeptID 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 DeptName Engineering Education Marketing Sales Engineering Education Marketing Sales Engineering Education Marketing Sales Engineering Education Marketing Sales Inner Joins Inner Join ID 1 2 3 4 Name Brian Karen Mary Dave Dept 1 2 3 1 DeptID 1 2 3 1 DeptName Engineering Education Marketing Engineering Inner Joins Two versions of the join syntax exist, corresponding to two different versions of the ANSI SQL Standard. Traditional syntax: SELECT * FROM tblAccounts A, tblUsers U WHERE A.npkAccountID = U.nfkAccountID AND dtCreated > ‘1/1/2003’ Instructions on how to join the tables listed in the FROM clause are placed in the WHERE clause Inner Joins The newer ANSI SQL 92 syntax makes it easier to write other join types: SELECT * FROM tblAccounts A INNER JOIN tblUsers U ON A.npkAccountID = U.nfkAccountID WHERE dtCreated > ‘1/1/2003’ Instructions on how to join the tables listed in the FROM clause are placed in a new ON clause, leaving the WHERE clause for filtering records Inner Join syntax comparison SELECT * FROM tblAccounts A, tblUsers U WHERE A.npkAccountID = U.nfkAccountID AND dtCreated > ‘1/1/2003’ SELECT * FROM tblAccounts A INNER JOIN tblUsers U ON A.npkAccountID = U.nfkAccountID WHERE dtCreated > ‘1/1/2003’ Inner Joins - more tables SELECT * FROM (tblAccounts A INNER JOIN tblUsers U ON A.npkAccountID = U.nfkAccountID) INNER JOIN tblInstitutions I ON A.nfkInstitutionID = I.npkInstiutionID WHERE dtCreated > ‘1/1/2003’ Part I: Advanced SQL Outer Joins Outer Joins are an extremely powerful query feature available in SQL. Outer joins are supported in nearly all major RDBMSs In an outer join, all records from one table will be included in the resulting recordset, along with any records in the second table that can be correlated to records in the first table. As with inner joins, fields from both tables appear “side-by-side” in the resulting record set, but fields from records that have no counterpart in the second table receive NULL values. Outer Joins Outer joins are directional. An outer join may either be a left outer join or a right outer join. By specifying the direction of the join, you tell your database from which table it should include all records: SELECT * FROM tblInstitutions I LEFT OUTER JOIN tblAccounts A ON I.npkInstitutionID = A.nfkInstitutionID will include all records from the table tblInstitutions. Outer Join Outer Join ID 1 2 3 4 NULL Name Brian Karen Mary Dave NULL Dept 1 2 3 1 NULL DeptID 1 2 3 1 4 DeptName Engineering Education Marketing Engineering Sales Outer Joins Often, the table pointed to by the direction of an outer join will be the parent table in a relationship. tblAccounts A LEFT OUTER JOIN tblUsers U makes much more sense than: tblUsers U LEFT OUTER JOIN tblAccounts A as tblUsers shouldn’t contain any rows that don’t relate to an Account, but an Account may have no users defined. Aggregate Functions Nearly all RDBMSs have support for basic aggregate functions (also called set functions) - functions that operate on a set of values and return a single value. Examples: Count(*) Sum(*) Max(*) Min(*) Aggregate functions collapse your recordset into a single record For example: SELECT count(*) as RecordCount, max(nRevenue) as MaxRevenue, min(nRevenue) as MinRevenue, avg(nRevenue) as AvgRevenue FROM tblModuleData_FoodSvc New tables npkDemogID = nfkDemogID tblDemographics npkDemogID SERIAL nfkAccountID INTEGER nfkAcadYearID INTEGER nFTEEnrollment INTEGER dtUpdated TIMESTAMP tblModuleData_FoodSvc nfkDemogID INTEGER nMealPlanEnrollment INTEGER nPotentialMeals INTEGER nActualMeals INTEGER nTotalMeals INTEGER fMealPlanSales NUMERIC(15,2) fCateringSales NUMERIC(15,2) fConferenceSales NUMERIC(15,2) fRevenue NUMERIC(15,2) fFoodBevCosts NUMERIC(15,2) fLaborHours FLOAT fSalariesWages NUMERIC(15,2) fLaborCosts NUMERIC(15,2) fNonFoodCosts NUMERIC(15,2) fDirectCosts NUMERIC(15,2) fOtherCosts NUMERIC(15,2) nfkMgmtArrangeID SMALLINT nfkMgmtFirmID INTEGER [0,n] not null not null not null not null null [0,n] not null not null not null not null not null not null not null not null not null not null not null not null not null not null not null not null not null null npkAccountID = nfkAccountID npkAccountID nfkAccountRepID nfkInstitutionID dtCreated tblAccounts SERIAL not null [0,n] INTEGER not null INTEGER npkAccountRepID not null TIMESTAMP not null = tblAccountReps npkAccountRepID SERIAL not null vcNameFirst VARCHAR(24) null nfkAccountRepID vcNameLast VARCHAR(24) null Grouping Records You may split your recordset into two or more groups according to the values of one of the columns in your query: SELECT nfkAccountID, count(*) as RecordCount, max(nRevenue) as MaxRevenue, min(nRevenue) as MinRevenue, avg(nRevenue) as AvgRevenue FROM tblModuleData_FoodSvc M INNER JOIN tblDemographics D ON M.nfkDemogID = D.npkDemogID GROUP BY nfkAccountID Subqueries Many RDBMSs allow you to nest queries into a single operation. The two most common subquery examples are: “Virtual Tables” “IN-clause” subqueries Subqueries “Virtual table” subqueries allow you to use the result of a query as a temporary table in a second query: SELECT * FROM tblAccounts A INNER JOIN (SELECT nfkAccountID, count(*) FROM tblUsers U GROUP BY nfkAccountID) U ON A.npkAccountID = U.nfkAccountID this can be used to reduce the number of fields in your GROUP BY statement in the outer query Subqueries “IN clause” subqueries allow you filter your records according to the values obtained by a subquery SELECT * FROM tblUsers U WHERE nfkAccountID IN (SELECT npkAccountID FROM tblAccounts WHERE count(*) > 2 GROUP BY nfkAccountRepID) Subquery Example EduSoft’s portal Web site is meant to be used by multiple individuals at a given institution. If multiple users haven’t been created within three months of account creation, chances are the main user at the institution is not making good use of the account and may need help. Find all Accounts opened before 1/1/03 for which only one user login has been established, and pass that information onto our Customer Service manager Subquery Example SELECT npkAccountID, dtCreated FROM tblAccounts WHERE npkAccountID IN (SELECT nfkAccountID FROM tblUsers WHERE count(*) = 1 GROUP BY nfkAccountID) AND dtCreated < ‘1/1/2003’ Stored Procedures Stored Procedures allow you to pre-compile complicated queries on the database server Most RDBMSs that support SPs provide you with a fairly functional programming language in which to write them: Transact-SQL (T-SQL) is MS SQL Server PL/SQL in Oracle Stored Procedures • On the server side, the SP is written in your RDBMSs dialect of SQL • This SP is from MS SQL Server 7 Stored Procedures SPs are excellent components for two main reasons: Database logic encapsulation - have all queries coded into SPs by your DBA, and made available to developers Performance - the SQL does not need to be interpreted and compiled on each execution, saving processor time Big uses for SPs include Ad Banner or click tracking, autonumber tables for DBMSs without sequences