Das Bild kann zurzeit nicht angezeigt werden. Introduction Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use Database Management System (DBMS) n DBMS contains information about a particular enterprise l Collection of interrelated data l Set of programs to access the data l An environment that is both convenient and efficient to use n Database Applications: l Banking: transactions l Airlines: reservations, schedules l Universities: registration, grades l Sales: customers, products, purchases l Online retailers: order tracking, customized recommendations l Manufacturing: production, inventory, orders, supply chain l Human resources: employee records, salaries, tax deductions Database System Concepts - 6th Edition 1.2 ©Silberschatz, Korth and Sudarshan History of Database Systems (1) n 1950s and early 1960s: l Data processing using magnetic tapes for storage 4Tapes l provided only sequential access Punched cards for input n In the early days, database applications were built directly on top of file systems Database System Concepts - 6th Edition 1.3 ©Silberschatz, Korth and Sudarshan Drawbacks of using file systems to store data n In the early days, database applications were built directly on top of file systems l Data redundancy and inconsistency 4 l Multiple file formats, duplication of information in different files Difficulty in accessing data 4 Need to write a new program to carry out each new task l Data isolation — multiple files and formats l Integrity problems 4 Integrity constraints (e.g., account balance > 0) become “buried” in program code rather than being stated explicitly 4 Hard to add new constraints or change existing ones Database System Concepts - 6th Edition 1.4 ©Silberschatz, Korth and Sudarshan Drawbacks of using file systems to store data (Cont.) l l Atomicity of updates 4 Failures may leave database in an inconsistent state with partial updates carried out 4 Example: Transfer of funds from one account to another should either complete or not happen at all Concurrent access by multiple users 4 Concurrent access needed for performance 4 Uncontrolled concurrent accesses can lead to inconsistencies – Example: Two people reading a balance (say 100) and updating it by withdrawing money (say 50 each) at the same time l Security problems 4 Hard to provide user access to some, but not all, data Database systems offer solutions to all the above problems Database System Concepts - 6th Edition 1.5 ©Silberschatz, Korth and Sudarshan Levels of Abstraction in a DBMS n Physical level: describes how a record (e.g., customer) is stored. n Logical level: describes data stored in database, and the relationships among the data. type instructor = record ID : string; name : string; dept_name : string; salary : integer; end; n View level: application programs hide details of data types. Views can also hide information (such as an employee’s salary) for security purposes. Database System Concepts - 6th Edition 1.6 ©Silberschatz, Korth and Sudarshan History of Database Systems (2) n Late 1960s and 1970s: l Hard disks allowed direct access to data l Network and hierarchical data models in widespread use l Ted Codd defines the relational data model l 4 Would win the ACM Turing Award for this work 4 IBM Research begins System R prototype 4 UC Berkeley begins Ingres prototype High-performance (for the era) transaction processing Database System Concepts - 6th Edition 1.7 ©Silberschatz, Korth and Sudarshan Data Models n A collection of tools for describing l Data l Data relationships l Data semantics l Data constraints n Relational model n Entity-Relationship data model (mainly for database design) n Object-based data models (Object-oriented and Object-relational) n Semistructured data model (XML) n Other older models: l l Network model Hierarchical model Database System Concepts - 6th Edition 1.8 ©Silberschatz, Korth and Sudarshan Relational Model n Example of tabular data in the relational model Columns Rows Database System Concepts - 6th Edition 1.9 ©Silberschatz, Korth and Sudarshan A Sample Relational Database Database System Concepts - 6th Edition 1.10 ©Silberschatz, Korth and Sudarshan Data Definition Language (DDL) n Language for accessing and manipulating the data organized by the appropriate data model n Specification notation for defining the database schema Example: create table instructor ( ID char(5), name varchar(20), dept_name varchar(20), salary numeric(8,2)) n DDL compiler generates a set of table templates stored in a data dictionary n Data dictionary contains metadata (i.e., data about data) l Database schema l Integrity constraints 4 Primary key (ID uniquely identifies instructors) 4 Referential integrity (references constraint in SQL) – e.g. dept_name value in any instructor tuple must appear in department relation l Authorization Database System Concepts - 6th Edition 1.11 ©Silberschatz, Korth and Sudarshan SQL n SQL: widely used non-procedural language l Example: Find the name of the instructor with ID 22222 select name from instructor where instructor.ID = ‘22222’ l Example: Find the ID and building of instructors in the Physics dept. select instructor.ID, department.building from instructor, department where instructor.dept_name = department.dept_name and department.dept_name = ‘Physics’ n Application programs generally access databases through one of l Language extensions to allow embedded SQL l Application program interface (e.g., ODBC/JDBC) which allow SQL queries to be sent to a database Database System Concepts - 6th Edition 1.12 ©Silberschatz, Korth and Sudarshan Modes of access to DBMS Database System Concepts - 6th Edition 1.13 ©Silberschatz, Korth and Sudarshan Application Programs and User Interfaces n Most database users do not use a query language like SQL n An application program acts as the intermediary between users and the database l Applications split into 4 front-end 4 middle layer 4 backend n Front-end: user interface l Forms l Graphical user interfaces l Many interfaces are Web-based Database System Concepts - 6th Edition 1.14 ©Silberschatz, Korth and Sudarshan Application Architecture Evolution n Three distinct era’s of application architecture l mainframe (1960’s and 70’s) l personal computer era (1980’s) l Web era (1990’s onwards) Database System Concepts - 6th Edition 1.15 ©Silberschatz, Korth and Sudarshan Application Architecture at Web era model-view-controller (MVC) architecture model: business logic view: presentation of data, depends on display device controller: receives events, executes actions, and returns a view to the user data access layer interfaces between business logic layer and the underlying database provides mapping from object model of business layer to relational model of database Database System Concepts - 6th Edition 1.16 ©Silberschatz, Korth and Sudarshan Database Design The process of designing the general structure of the database: n Logical Design – Deciding on the database schema. Database design requires that we find a “good” collection of relation schemas. l Business decision – What attributes should we record in the database? l Computer Science decision – What relation schemas should we have and how should the attributes be distributed among the various relation schemas? n Physical Design – Deciding on the physical layout of the database Database System Concepts - 6th Edition 1.17 ©Silberschatz, Korth and Sudarshan Design Approaches n Entity Relationship Model l Models an enterprise as a collection of entities and relationships 4 Entity: a “thing” or “object” in the enterprise that is distinguishable from other objects – Described by a set of attributes 4 l Relationship: an association among several entities Represented diagrammatically by an entity-relationship diagram: n Normalization Theory l Formalize what designs are bad, and test for them Database System Concepts - 6th Edition 1.18 ©Silberschatz, Korth and Sudarshan Database Design? n Is there any problem with this design? Database System Concepts - 6th Edition 1.19 ©Silberschatz, Korth and Sudarshan The Entity-Relationship Model n Models an enterprise as a collection of entities and relationships l Entity: a “thing” or “object” in the enterprise that is distinguishable from other objects 4 l Described by a set of attributes Relationship: an association among several entities n Represented diagrammatically by an entity-relationship diagram: What happened to dept_name of instructor and student? Database System Concepts - 6th Edition 1.20 ©Silberschatz, Korth and Sudarshan History (3) n 1980s: l Research relational prototypes evolve into commercial systems 4 SQL becomes industrial standard l Parallel and distributed database systems l Object-oriented database systems n 1990s: l Large decision support and data-mining applications l Large multi-terabyte data warehouses l Emergence of Web commerce n Early 2000s: l XML and XQuery standards l Automated database administration n Later 2000s: l Giant data storage systems 4 Google BigTable, Yahoo PNuts, Amazon, .. Database System Concepts - 6th Edition 1.21 ©Silberschatz, Korth and Sudarshan End of Introduction Database System Concepts - 6th Edition 1.22 ©Silberschatz, Korth and Sudarshan Das Bild kann zurzeit nicht angezeigt werden. Relational Model Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use Example of a Relation attributes (or columns) tuples (or rows) Database System Concepts - 6th Edition 2.2 ©Silberschatz, Korth and Sudarshan Attribute Types n The set of allowed values for each attribute is called the domain of the attribute n Attribute values are (normally) required to be atomic; that is, indivisible n The special value null is a member of every domain l The null value causes complications in the definition of many operations Database System Concepts - 6th Edition 2.3 ©Silberschatz, Korth and Sudarshan Relation Schema and Instance n A1, A2, …, An are attributes n R = (A1, A2, …, An ) is a relation schema Example: instructor = (ID, name, dept_name, salary) n Formally, given sets D1, D2, …. Dn a relation r is a subset of D1 x D2 x … x Dn Thus, a relation is a set of n-tuples (a1, a2, …, an) where each ai Î Di n The current values (relation instance) of a relation are specified by a table n An element t of r is a tuple, represented by a row in a table Database System Concepts - 6th Edition 2.4 ©Silberschatz, Korth and Sudarshan Relations are Unordered n Order of tuples is irrelevant (tuples may be stored in an arbitrary order) n Example: instructor relation with unordered tuples n Tuples are not repeated (appear only once) nExample: The two demonstrations of instructor relation are equivalent Database System Concepts - 6th Edition 2.5 ©Silberschatz, Korth and Sudarshan Database n A database consists of multiple relations n Information about an enterprise (e.g., University) is broken up into parts instructor student advisor n Bad design: univ (instructor -ID, name, dept_name, salary, student_Id, ..) results in l repetition of information (e.g., two students have the same instructor) l the need for null values (e.g., represent a student with no advisor) n Normalization theory deals with how to design “good” relational schemas Database System Concepts - 6th Edition 2.6 ©Silberschatz, Korth and Sudarshan Keys n R = (A1, A2, …, An ) is a relation schema n Let K Í R n K is a superkey of R if values for K are sufficient to identify a unique tuple of each possible relation r(R) l Example: {ID} and {ID,name} are both superkeys of instructor. n Superkey K is a candidate key if K is minimal Example: {ID} is a candidate key for Instructor n One of the candidate keys is selected to be the primary key. l which one? chosen by the database designer l its attributes should never, or very rarely, change n Foreign key constraint: Value in one relation must appear in another l Referencing relation l Referenced relation Database System Concepts - 6th Edition 2.7 ©Silberschatz, Korth and Sudarshan Schema Diagram for University Database Database System Concepts - 6th Edition 2.8 ©Silberschatz, Korth and Sudarshan Relational Query Languages n How to retrieve the entries of a database? n Procedural vs.non-procedural, or declarative n “Pure” languages: l Relational algebra l Tuple relational calculus l Domain relational calculus n Relational operators Database System Concepts - 6th Edition 2.9 ©Silberschatz, Korth and Sudarshan Selection of tuples n Relation r n Select tuples with A=B and D > 5 nσ A=B and D > 5 Database System Concepts - 6th Edition (r) 2.10 ©Silberschatz, Korth and Sudarshan Selection of Columns (Attributes) n Relation r: n Select A and C nProjection nΠ A, C (r) Database System Concepts - 6th Edition 2.11 ©Silberschatz, Korth and Sudarshan Joining two relations – Cartesian Product n Relations r, s: n r x s: Database System Concepts - 6th Edition 2.12 ©Silberschatz, Korth and Sudarshan Union of two relations n Relations r, s: n r È s: Database System Concepts - 6th Edition 2.13 ©Silberschatz, Korth and Sudarshan Set difference of two relations n Relations r, s: n r – s: Database System Concepts - 6th Edition 2.14 ©Silberschatz, Korth and Sudarshan Set Intersection of two relations n Relation r, s: n rÇs Database System Concepts - 6th Edition 2.15 ©Silberschatz, Korth and Sudarshan Joining two relations – Natural Join n Let r and s be relations on schemas R and S respectively. Then, the “natural join” of relations R and S is a relation on schema R È S obtained as follows: l Consider each pair of tuples tr from r and ts from s. l If tr and ts have the same value on each of the attributes in R Ç S, add a tuple t to the result, where 4 t has the same value as tr on r 4 t has the same value as ts on s Database System Concepts - 6th Edition 2.16 ©Silberschatz, Korth and Sudarshan Natural Join Example n Relations r, s: n Natural Join n r s Database System Concepts - 6th Edition 2.17 ©Silberschatz, Korth and Sudarshan Figure in-2.1 Database System Concepts - 6th Edition 2.18 ©Silberschatz, Korth and Sudarshan Schema Diagram for University Database Database System Concepts - 6th Edition 2.19 ©Silberschatz, Korth and Sudarshan Instructor Database System Concepts - 6th Edition 2.20 ©Silberschatz, Korth and Sudarshan Course Database System Concepts - 6th Edition 2.21 ©Silberschatz, Korth and Sudarshan Prereq Database System Concepts - 6th Edition 2.22 ©Silberschatz, Korth and Sudarshan Department Database System Concepts - 6th Edition 2.23 ©Silberschatz, Korth and Sudarshan Section Database System Concepts - 6th Edition 2.24 ©Silberschatz, Korth and Sudarshan Takes Database System Concepts - 6th Edition 2.25 ©Silberschatz, Korth and Sudarshan Das Bild kann zurzeit nicht angezeigt werden. End Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use Das Bild kann zurzeit nicht angezeigt werden. Simple SQL Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use Simple SQL n Overview of the SQL Query Language n Data Definition n Basic Query Structure n Additional Basic Operations n Set Operations n Null Values n Aggregate Functions n Nested Subqueries n Modification of the Database Database System Concepts - 6th Edition 3.2 ©Silberschatz, Korth and Sudarshan History n IBM Sequel language developed as part of System R project at the IBM San Jose Research Laboratory n Renamed Structured Query Language (SQL) n ANSI and ISO standard SQL: l SQL-86, SQL-89, SQL-92 l SQL:1999, SQL:2003, SQL:2008 n Commercial systems offer most, if not all, SQL-92 features, plus varying feature sets from later standards and special proprietary features. l Not all examples here may work on your particular system. Database System Concepts - 6th Edition 3.3 ©Silberschatz, Korth and Sudarshan Data Definition Language The SQL data-definition language (DDL) allows the specification of information about relations, including: n The schema for each relation. n The domain of values associated with each attribute. n Integrity constraints n And as we will see later, also other information such as l The set of indices to be maintained for each relations. l Security and authorization information for each relation. l The physical storage structure of each relation on disk. Database System Concepts - 6th Edition 3.4 ©Silberschatz, Korth and Sudarshan Domain Types in SQL n char(n). Fixed length character string, with user-specified length n. n varchar(n). Variable length character strings, with user-specified n n n n n maximum length n. int. Integer (a finite subset of the integers that is machinedependent). smallint. Small integer (a machine-dependent subset of the integer domain type). numeric(p,d). Fixed point number, with user-specified precision of p digits, with n digits to the right of decimal point. real, double precision. Floating point and double-precision floating point numbers, with machine-dependent precision. float(n). Floating point number, with user-specified precision of at least n digits. Database System Concepts - 6th Edition 3.5 ©Silberschatz, Korth and Sudarshan Create Table Construct n An SQL relation is defined using the create table command: create table r (A1 D1, A2 D2, ..., An Dn, (integrity-constraint1), ..., (integrity-constraintk)) l r is the name of the relation l each Ai is an attribute name in the schema of relation r l Di is the data type of values in the domain of attribute Ai n Example: create table instructor ( ID char(5), name varchar(20) not null, dept_name varchar(20), salary numeric(8,2)) n insert into instructor values (‘10211’, ’Smith’, ’Biology’, 66000); n insert into instructor values (‘10211’, null, ’Biology’, 66000); Database System Concepts - 6th Edition 3.6 ©Silberschatz, Korth and Sudarshan Integrity Constraints in Create Table n not null n primary key (A1, ..., An ) n foreign key (Am, ..., An ) references r Example: Declare dept_name as the primary key for department . create table instructor ( ID char(5), name varchar(20) not null, dept_name varchar(20), salary numeric(8,2), primary key (ID), foreign key (dept_name) references department) primary key declaration on an attribute automatically ensures not null Database System Concepts - 6th Edition 3.7 ©Silberschatz, Korth and Sudarshan And a Few More Relation Definitions n create table student ( ID varchar(5), name varchar(20) not null, dept_name varchar(20), tot_cred numeric(3,0), primary key (ID), foreign key (dept_name) references department) ); n create table takes ( ID varchar(5), course_id varchar(8), sec_id varchar(8), semester varchar(6), year numeric(4,0), grade varchar(2), primary key (ID, course_id, sec_id, semester, year), foreign key (ID) references student, foreign key (course_id, sec_id, semester, year) references section ); l Note: sec_id can be dropped from primary key above, to ensure a student cannot be registered for two sections of the same course in the same semester Database System Concepts - 6th Edition 3.8 ©Silberschatz, Korth and Sudarshan And more still n create table course ( course_id varchar(8) primary key, title varchar(50), dept_name varchar(20), credits numeric(2,0), foreign key (dept_name) references department) ); l Primary key declaration can be combined with attribute declaration as shown above Database System Concepts - 6th Edition 3.9 ©Silberschatz, Korth and Sudarshan Drop and Alter Table Constructs n drop table student Deletes the table and its contents n delete from student l Deletes all contents of table, but retains table l n alter table l alter table r add A D 4 where A is the name of the attribute to be added to relation r and D is the domain of A. 4 All tuples in the relation are assigned null as the value for the new attribute. l alter table r drop A 4 where A is the name of an attribute of relation r 4 Dropping of attributes not supported by many databases Database System Concepts - 6th Edition 3.10 ©Silberschatz, Korth and Sudarshan Basic Query Structure n The SQL data-manipulation language (DML) provides the ability to query information, and insert, delete and update tuples n A typical SQL query has the form: select A1, A2, ..., An from r1, r2, ..., rm where P l Ai represents an attribute l Ri represents a relation l P is a predicate. n The result of an SQL query is a relation. Database System Concepts - 6th Edition 3.11 ©Silberschatz, Korth and Sudarshan The select Clause n The select clause list the attributes desired in the result of a query l corresponds to the projection operation of the relational algebra n Example: find the names of all instructors: select name from instructor n NOTE: SQL names are case insensitive (i.e., you may use upper- or lower-case letters.) l E.g. Name ≡ NAME ≡ name l Some people use upper case wherever we use bold font. Database System Concepts - 6th Edition 3.12 ©Silberschatz, Korth and Sudarshan The select Clause (Cont.) n SQL allows duplicates in relations as well as in query results. n To force the elimination of duplicates, insert the keyword distinct after select. n Find the names of all departments with instructor, and remove duplicates select distinct dept_name from instructor n The keyword all specifies that duplicates not be removed. select all dept_name from instructor Database System Concepts - 6th Edition 3.13 ©Silberschatz, Korth and Sudarshan The select Clause (Cont.) n An asterisk in the select clause denotes “all attributes” select * from instructor n The select clause can contain arithmetic expressions involving the operation, +, –, *, and /, and operating on constants or attributes of tuples. n The query: select ID, name, salary/12 from instructor would return a relation that is the same as the instructor relation, except that the value of the attribute salary is divided by 12. Database System Concepts - 6th Edition 3.14 ©Silberschatz, Korth and Sudarshan The where Clause n The where clause specifies conditions that the result must satisfy l Corresponds to the selection predicate of the relational algebra. n To find all instructors in Comp. Sci. dept with salary > 80000 select name from instructor where dept_name = ‘Comp. Sci.' and salary > 80000 n Comparison results can be combined using the logical connectives and, or, and not. n Comparisons can be applied to results of arithmetic expressions. Database System Concepts - 6th Edition 3.15 ©Silberschatz, Korth and Sudarshan The from Clause n The from clause lists the relations involved in the query l Corresponds to the Cartesian product operation of the relational algebra. n Find the Cartesian product instructor X teaches select * from instructor, teaches l generates every possible instructor – teaches pair, with all attributes from both relations n Cartesian product not very useful directly, but useful combined with where-clause condition (selection operation in relational algebra) Database System Concepts - 6th Edition 3.16 ©Silberschatz, Korth and Sudarshan Cartesian Product: instructor X teaches instructor Database System Concepts - 6th Edition teaches 3.17 ©Silberschatz, Korth and Sudarshan Joins n For all instructors who have taught some course, find their names and the course ID of the courses they taught. select name, course_id from instructor, teaches where instructor.ID = teaches.ID n Find the course ID, semester, year and title of each course offered by the Comp. Sci. department select section.course_id, semester, year, title from section, course where section.course_id = course.course_id and dept_name = ‘Comp. Sci.' Database System Concepts - 6th Edition 3.18 ©Silberschatz, Korth and Sudarshan Natural Join n Natural join matches tuples with the same values for all common attributes, and retains only one copy of each common column n select * from instructor natural join teaches; Database System Concepts - 6th Edition 3.19 ©Silberschatz, Korth and Sudarshan Natural Join Example n List the names of instructors along with the course ID of the courses that they taught. l select name, course_id from instructor, teaches where instructor.ID = teaches.ID; l select name, course_id from instructor natural join teaches; select name, course_id from instructor join teaches on instructor.ID = teaches.ID; Database System Concepts - 6th Edition 3.20 ©Silberschatz, Korth and Sudarshan Natural Join (Cont.) n Danger in natural join: beware of unrelated attributes with same name which get equated incorrectly n List the names of instructors along with the the titles of courses that they teach l Incorrect version (makes course.dept_name = instructor.dept_name) 4 l Correct version 4 l select name, title from instructor natural join teaches natural join course; select name, title from instructor natural join teaches, course where teaches.course_id = course.course_id; Another correct version 4 select name, title from (instructor natural join teaches) join course using(course_id); Database System Concepts - 6th Edition 3.21 ©Silberschatz, Korth and Sudarshan The Rename Operation n The SQL allows renaming relations and attributes using the as clause: old-name as new-name n E.g. l select ID, name, salary/12 as monthly_salary from instructor n Find the names of all instructors who have a higher salary than some instructor in ‘Comp. Sci’. l select distinct T. name from instructor as T, instructor as S where T.salary > S.salary and S.dept_name = ‘Comp. Sci.’ n Keyword as is optional and may be omitted instructor as T ≡ instructor T l Keyword as must be omitted in Oracle Database System Concepts - 6th Edition 3.22 ©Silberschatz, Korth and Sudarshan String Operations n SQL includes a string-matching operator for comparisons on character strings. The operator “like” uses patterns that are described using two special characters: l percent (%). The % character matches any substring. l underscore (_). The _ character matches any character. n Find the names of all instructors whose name includes the substring “dar”. select name from instructor where name like '%dar%' n Match the string “100 %” like ‘100 \%' escape '\' Database System Concepts - 6th Edition 3.23 ©Silberschatz, Korth and Sudarshan String Operations (Cont.) n Patters are case sensitive. n Pattern matching examples: l ‘Intro%’ matches any string beginning with “Intro”. l ‘%Comp%’ matches any string containing “Comp” as a substring. l ‘_ _ _’ matches any string of exactly three characters. l ‘_ _ _ %’ matches any string of at least three characters. n SQL supports a variety of string operations such as l concatenation (using “||”) l converting from upper to lower case (and vice versa) l finding string length, extracting substrings, etc. Database System Concepts - 6th Edition 3.24 ©Silberschatz, Korth and Sudarshan Ordering the Display of Tuples n List in alphabetic order the names of all instructors select distinct name from instructor order by name n We may specify desc for descending order or asc for ascending order, for each attribute; ascending order is the default. l Example: order by name desc n Can sort on multiple attributes and on renamings l select name, ceiling(salary/1000) as [salary in thousands] from instructor order by [salary in thousands] desc, name asc Database System Concepts - 6th Edition 3.25 ©Silberschatz, Korth and Sudarshan Where Clause Predicates n SQL includes a between comparison operator n Example: Find the names of all instructors with salary between $90,000 and $100,000 (that is, ³ $90,000 and £ $100,000) l select name from instructor where salary between 90000 and 100000 n Tuple comparison l select name, course_id from instructor, teaches where (instructor.ID, dept_name) = (teaches.ID, ’Biology’); select name, course_id from instructor, teaches where instructor.ID = teaches.ID and dept_name = 'Biology'; Database System Concepts - 6th Edition 3.26 ©Silberschatz, Korth and Sudarshan Duplicates n In relations with duplicates, SQL can define how many copies of tuples appear in the result. n Multiset versions of some of the relational algebra operators – given multiset relations r1 and r2: 1. sq (r1): If there are c1 copies of tuple t1 in r1, and t1 satisfies selections sq,, then there are c1 copies of t1 in sq (r1). 2. PA (r ): For each copy of tuple t1 in r1, there is a copy of tuple PA (t1) in PA (r1) where PA (t1) denotes the projection of the single tuple t1. 3. r1 x r2 : If there are c1 copies of tuple t1 in r1 and c2 copies of tuple t2 in r2, there are c1 x c2 copies of the tuple t1. t2 in r1 x r2 Database System Concepts - 6th Edition 3.27 ©Silberschatz, Korth and Sudarshan Duplicates (Cont.) n Example: Suppose multiset relations r1 (A, B) and r2 (C) are as follows: r1 = {(1, a) (2,a)} r2 = {(2), (3), (3)} n Then PB(r1) would be {(a), (a)}, while PB(r1) x r2 would be {(a,2), (a,2), (a,3), (a,3), (a,3), (a,3)} n SQL duplicate semantics: select A1,, A2, ..., An from r1, r2, ..., rm where P is equivalent to the multiset version of the expression: Õ A1,A2 ,K,An (s P (r1 ´ r2 ´ K ´ rm )) Database System Concepts - 6th Edition 3.28 ©Silberschatz, Korth and Sudarshan Set Operations n Find courses that ran in Fall 2009 or in Spring 2010 (select course_id from section where semester = ‘Fall’ and year = 2009) union (select course_id from section where semester = ‘Spring’ and year = 2010) n Find courses that ran in Fall 2009 and in Spring 2010 (select course_id from section where semester = ‘Fall’ and year = 2009) intersect (select course_id from section where semester = ‘Spring’ and year = 2010) n Find courses that ran in Fall 2009 but not in Spring 2010 (select course_id from section where semester = ‘Fall’ and year = 2009) except (select course_id from section where semester = ‘Spring’ and year = 2010) Database System Concepts - 6th Edition 3.29 ©Silberschatz, Korth and Sudarshan Set Operations n Set operations union, intersect, and except l Each of the above operations automatically eliminates duplicates n To retain all duplicates use the corresponding multiset versions union all, intersect all and except all. Suppose a tuple occurs m times in r and n times in s, then, it occurs: l m + n times in r union all s l min(m,n) times in r intersect all s l max(0, m – n) times in r except all s Database System Concepts - 6th Edition 3.30 ©Silberschatz, Korth and Sudarshan Null Values n It is possible for tuples to have a null value, denoted by null, for some of their attributes n null signifies an unknown value or that a value does not exist. n The result of any arithmetic expression involving null is null l Example: 5 + null returns null n The predicate is null can be used to check for null values. l Example: Find all instructors whose salary is null. select name from instructor where salary is null Database System Concepts - 6th Edition 3.31 ©Silberschatz, Korth and Sudarshan Null Values and Three Valued Logic n Any comparison with null returns unknown l Example: 5 < null or null <> null or null = null n Three-valued logic using the truth value unknown: l OR: (unknown or true) = true, (unknown or false) = unknown (unknown or unknown) = unknown l AND: (true and unknown) = unknown, (false and unknown) = false, (unknown and unknown) = unknown l NOT: (not unknown) = unknown l “P is unknown” evaluates to true if predicate P evaluates to unknown n Result of where clause predicate is treated as false if it evaluates to unknown Database System Concepts - 6th Edition 3.32 ©Silberschatz, Korth and Sudarshan Aggregate Functions n These functions operate on the multiset of values of a column of a relation, and return a value avg: average value min: minimum value max: maximum value sum: sum of values count: number of values Database System Concepts - 6th Edition 3.33 ©Silberschatz, Korth and Sudarshan Aggregate Functions (Cont.) n Find the average salary of instructors in the Computer Science department l select avg (salary) from instructor where dept_name= ’Comp. Sci.’; n Find the total number of instructors who teach a course in the Spring 2010 semester l select count (distinct ID) from teaches where semester = ’Spring’ and year = 2010 n Find the number of tuples in the course relation l select count (*) from course; Database System Concepts - 6th Edition 3.34 ©Silberschatz, Korth and Sudarshan Aggregate Functions – Group By n Find the average salary of instructors in each department l select dept_name, avg (salary) from instructor group by dept_name; l Note: departments with no instructor will not appear in result Database System Concepts - 6th Edition 3.35 ©Silberschatz, Korth and Sudarshan Aggregation (Cont.) n Attributes in select clause outside of aggregate functions must appear in group by list l /* erroneous query */ select dept_name, ID, avg (salary) from instructor group by dept_name; Database System Concepts - 6th Edition 3.36 ©Silberschatz, Korth and Sudarshan Aggregate Functions – Having Clause n Find the names and average salaries of all departments whose average salary is greater than 42000 select dept_name, avg (salary) from instructor group by dept_name having avg (salary) > 42000; Note: predicates in the having clause are applied after the formation of groups whereas predicates in the where clause are applied before forming groups Database System Concepts - 6th Edition 3.37 ©Silberschatz, Korth and Sudarshan Null Values and Aggregates n Total all salaries select sum (salary ) from instructor l Above statement ignores null amounts l Result is null if there is no non-null amount n All aggregate operations except count(*) ignore tuples with null values on the aggregated attributes n What if collection has only null values? l count returns 0 l all other aggregates return null Database System Concepts - 6th Edition 3.38 ©Silberschatz, Korth and Sudarshan Nested Subqueries n SQL provides a mechanism for the nesting of subqueries. n A subquery is a select-from-where expression that is nested within another query. n A common use of subqueries is to perform tests for set membership, set comparisons, and set cardinality. Database System Concepts - 6th Edition 3.39 ©Silberschatz, Korth and Sudarshan Example Query n Find courses offered in Fall 2009 and in Spring 2010 select distinct course_id from section where semester = ’Fall’ and year= 2009 and course_id in (select course_id from section where semester = ’Spring’ and year= 2010); n Find courses offered in Fall 2009 but not in Spring 2010 select distinct course_id from section where semester = ’Fall’ and year= 2009 and course_id not in (select course_id from section where semester = ’Spring’ and year= 2010); Database System Concepts - 6th Edition 3.40 ©Silberschatz, Korth and Sudarshan Example Query n Find the total number of (distinct) studentswho have taken course sections taught by the instructor with a given ID select count (distinct ID) from takes where course_id in (select course_id from teaches where teaches.ID=14365); n Note: Try without distinct Database System Concepts - 6th Edition 3.41 ©Silberschatz, Korth and Sudarshan Set Comparison n Find names of instructors with salary greater than that of some (at least one) instructor in the Biology department. select distinct T.name from instructor as T, instructor as S where T.salary > S.salary and S.dept_name = ’Biology’; n Same query using > some clause select name from instructor where salary > some (select salary from instructor where dept_name = ’Biology’); Database System Concepts - 6th Edition 3.42 ©Silberschatz, Korth and Sudarshan Definition of Some Clause n F <comp> some r Û $ t Î r such that (F <comp> t ) Where <comp> can be: <, £, >, =, ¹ 0 5 6 ) = true (5 < some 0 5 ) = false (5 = some 0 5 ) = true (5 ¹ some 0 5 ) = true (since 0 ¹ 5) (5 < some (read: 5 < some tuple in the relation) (= some) º in However, (¹ some) º not in Database System Concepts - 6th Edition 3.43 ©Silberschatz, Korth and Sudarshan Example Query n Find the names of all instructors whose salary is greater than the salary of all instructors in the Biology department. select name from instructor where salary > all (select salary from instructor where dept_name = ’Biology’); Database System Concepts - 6th Edition 3.44 ©Silberschatz, Korth and Sudarshan Definition of all Clause n F <comp> all r Û " t Î r (F <comp> t) (5 < all 0 5 6 ) = false (5 < all 6 10 ) = true (5 = all 4 5 ) = false (5 ¹ all 4 6 ) = true (since 5 ¹ 4 and 5 ¹ 6) (¹ all) º not in However, (= all) º in Database System Concepts - 6th Edition 3.45 ©Silberschatz, Korth and Sudarshan Test for Empty Relations n The exists construct returns the value true if the argument subquery is nonempty. n exists r Û r ¹ Ø n not exists r Û r = Ø Database System Concepts - 6th Edition 3.46 ©Silberschatz, Korth and Sudarshan Correlation Variables n Yet another way of specifying the query “Find all courses taught in both the Fall 2009 semester and in the Spring 2010 semester” select course_id from section as S where semester = ’Fall’ and year= 2009 and exists (select * from section as T where semester = ’Spring’ and year= 2010 and S.course_id= T.course_id); n Correlated subquery n Correlation name or correlation variable Database System Concepts - 6th Edition 3.47 ©Silberschatz, Korth and Sudarshan Not Exists n Find all students who have taken all courses offered in the Biology department. select distinct S.ID, S.name from student as S where not exists ( (select course_id from course where dept_name = ’Biology’) except (select T.course_id from takes as T where S.ID = T.ID)); n Note that X – Y = Ø Û X Í Y n Note: Cannot write this query using = all and its variants Database System Concepts - 6th Edition 3.48 ©Silberschatz, Korth and Sudarshan Test for Absence of Duplicate Tuples n The unique construct tests whether a subquery has any duplicate tuples in its result. l (Evaluates to “true” on an empty set) n Find all courses that were offered at most once in 2008 select T.course_id, T.title from course as T where unique (select R.course_id from section as R where T.course_id= R.course_id and R.year = 2008); select T.course_id, T.title from course as T where T.course_id in (select R.course_id from section as R where R.year=2008 group by R.course_id having COUNT(*) = 1); Database System Concepts - 6th Edition 3.49 ©Silberschatz, Korth and Sudarshan Subqueries in the From Clause n SQL allows a subquery expression to be used in the from clause n Find the average instructors’ salaries of those departments where the average salary is greater than $42,000. select dept_name, avg_salary from (select dept_name, avg (salary) as avg_salary from instructor group by dept_name) where avg_salary > 42000; n Note that we do not need to use the having clause n Another way to write above query select dept_name, avg_salary from (select dept_name, avg (salary) from instructor group by dept_name) as dept_avg (dept_name, avg_salary) where avg_salary > 42000; Database System Concepts - 6th Edition 3.50 ©Silberschatz, Korth and Sudarshan Subqueries in the From Clause (Cont.) n And yet another way to write it: lateral clause select name, salary, avg_salary from instructor I1, lateral (select avg(salary) as avg_salary from instructor I2 where I2.dept_name= I1.dept_name); n Lateral clause permits later part of the from clause (after the lateral keyword) to access correlation variables from the earlier part. n Note: lateral is part of the SQL standard, but is not supported on many database systems; some databases such as SQL Server offer alternative syntax FIND THE ALTERNATIVE SYNTAX Database System Concepts - 6th Edition 3.51 ©Silberschatz, Korth and Sudarshan With Clause n The with clause provides a way of defining a temporary view whose definition is available only to the query in which the with clause occurs. n Find all departments with the maximum budget with max_budget (value) as (select max(budget) from department) select budget from department, max_budget where department.budget = max_budget.value; Database System Concepts - 6th Edition 3.52 ©Silberschatz, Korth and Sudarshan Complex Queries using With Clause n With clause is very useful for writing complex queries n Supported by most database systems, with minor syntax variations n Find all departments where the total salary is greater than the average of the total salary at all departments with dept_total (dept_name, value) as (select dept_name, sum(salary) from instructor group by dept_name), dept_total_avg(value) as (select avg(value) from dept_total) select dept_name from dept_total, dept_total_avg where dept_total.value >= dept_total_avg.value; Database System Concepts - 6th Edition 3.53 ©Silberschatz, Korth and Sudarshan Scalar Subquery n Scalar subquery is one which is used where a single value is expected n E.g. select dept_name, (select count(*) from instructor where department.dept_name = instructor.dept_name) as num_instructors from department; n E.g. select name from instructor where salary * 10 > (select budget from department where department.dept_name = instructor.dept_name) n Runtime error if subquery returns more than one result tuple Database System Concepts - 6th Edition 3.54 ©Silberschatz, Korth and Sudarshan Modification of the Database n Deletion of tuples from a given relation n Insertion of new tuples into a given relation n Updating values in some tuples in a given relation Database System Concepts - 6th Edition 3.55 ©Silberschatz, Korth and Sudarshan Modification of the Database – Deletion n Delete all instructors delete from instructor n Delete all instructors from the Finance department delete from instructor where dept_name= ’Finance’; n Delete all tuples in the instructor relation for those instructors associated with a department located in the Watson building. delete from instructor where dept_name in (select dept_name from department where building = ’Watson’); Database System Concepts - 6th Edition 3.56 ©Silberschatz, Korth and Sudarshan Deletion (Cont.) n Delete all instructors whose salary is less than the average salary of instructors delete from instructor where salary< (select avg (salary) from instructor); l Problem: as we delete tuples from deposit, the average salary changes l Solution used in SQL: 1. First, compute avg salary and find all tuples to delete 2. Next, delete all tuples found above (without recomputing avg or retesting the tuples) Database System Concepts - 6th Edition 3.57 ©Silberschatz, Korth and Sudarshan Modification of the Database – Insertion n Add a new tuple to course insert into course values (’CS-437’, ’Database Systems’, ’Comp. Sci.’, 4); n or equivalently insert into course (course_id, title, dept_name, credits) values (’CS-437’, ’Database Systems’, ’Comp. Sci.’, 4); n Add a new tuple to student with tot_creds set to null insert into student values (’3003’, ’Green’, ’Finance’, null); Database System Concepts - 6th Edition 3.58 ©Silberschatz, Korth and Sudarshan Insertion (Cont.) n Add all instructors to the student relation with tot_creds set to 0 insert into student select ID, name, dept_name, 0 from instructor n The select from where statement is evaluated fully before any of its results are inserted into the relation (otherwise queries like insert into table1 select * from table1 would cause problems, if table1 did not have any primary key defined. Database System Concepts - 6th Edition 3.59 ©Silberschatz, Korth and Sudarshan Modification of the Database – Updates n Increase salaries of instructors whose salary is over $100,000 by 3%, and all others receive a 5% raise l Write two update statements: update instructor set salary = salary * 1.03 where salary > 100000; update instructor set salary = salary * 1.05 where salary <= 100000; l The order is important l Can be done better using the case statement (next slide) Database System Concepts - 6th Edition 3.60 ©Silberschatz, Korth and Sudarshan Case Statement for Conditional Updates n Same query as before but with case statement update instructor set salary = case when salary <= 100000 then salary * 1.05 else salary * 1.03 end Database System Concepts - 6th Edition 3.61 ©Silberschatz, Korth and Sudarshan Updates with Scalar Subqueries n Recompute and update tot_creds value for all students update student set tot_cred = ( select sum(credits) from takes natural join course where student.ID= takes.ID and takes.grade <> ’F’ and takes.grade is not null); replace natural join n Sets tot_creds to null for students who have not taken any course l Just insert a random student in the student table and check it n Instead use: update student set tot_cred = (select case when sum(credits) is not null then sum(credits) else 0 end from takes join course on takes.course_id = course.course_id where student.ID= takes.ID and takes.grade <> 'F' and takes.grade is not null) Database System Concepts - 6th Edition 3.62 ©Silberschatz, Korth and Sudarshan Das Bild kann zurzeit nicht angezeigt werden. End Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use Das Bild kann zurzeit nicht angezeigt werden. Intermediate SQL Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use Intermediate SQL n Join Expressions n Views n Transactions n Integrity Constraints n SQL Data Types and Schemas n Authorization Database System Concepts - 6th Edition 4.2 ©Silberschatz, Korth and Sudarshan Joined Relations n Join operations take two relations and return as a result another relation. n A join operation is a Cartesian product which requires that tuples in the two relations match (under some condition). It also specifies the attributes that are present in the result of the join n The join operations are typically used as subquery expressions in the from clause Database System Concepts - 6th Edition 4.3 ©Silberschatz, Korth and Sudarshan Join operations – Example n Relation course n Relation prereq n Observe that prereq information is missing for CS-315 and course information is missing for CS-437 Database System Concepts - 6th Edition 4.4 ©Silberschatz, Korth and Sudarshan Outer Join n An extension of the join operation that avoids loss of information. n Computes the join and then adds tuples form one relation that does not match tuples in the other relation to the result of the join. n Uses null values. Database System Concepts - 6th Edition 4.5 ©Silberschatz, Korth and Sudarshan Left Outer Join n course natural left outer join prereq select * from course left outer join prereq on course.course_id = prereq.course_id; Why SQL Server Doesn’t Support Natural Join Syntax? Very nice for writing quick queries. Other major databases, such as MySQL and Oracle, do support natural joins. However, natural joins have some downsides 1. Because natural joins are implicit, there is no way to see what columns will be used in the join. You might not get what you think you’re getting. 2. If a column name or type is altered or the column is removed from one of the tables, the next time the SELECT statement is run the join will break. Database System Concepts - 6th Edition 4.6 ©Silberschatz, Korth and Sudarshan Right Outer Join n course natural right outer join prereq select * from course right outer join prereq on course.course_id = prereq.course_id; Database System Concepts - 6th Edition 4.7 ©Silberschatz, Korth and Sudarshan Full Outer Join n course natural full outer join prereq select * from course full outer join prereq on course.course_id = prereq.course_id; Database System Concepts - 6th Edition 4.8 ©Silberschatz, Korth and Sudarshan Joined Relations n Join operations take two relations and return as a result another relation. n These additional operations are typically used as subquery expressions in the from clause n Join condition – defines which tuples in the two relations match, and what attributes are present in the result of the join. n Join type – defines how tuples in each relation that do not match any tuple in the other relation (based on the join condition) are treated. Database System Concepts - 6th Edition 4.9 ©Silberschatz, Korth and Sudarshan Various forms of join conditions n course inner join prereq on course.course_id = prereq.course_id n What is the difference between the above, and a natural join? n course left outer join prereq on course.course_id = prereq.course_id Database System Concepts - 6th Edition 4.10 ©Silberschatz, Korth and Sudarshan Various forms of join conditions n course natural right outer join prereq n course full outer join prereq using (course_id) Database System Concepts - 6th Edition 4.11 ©Silberschatz, Korth and Sudarshan Views n In some cases, it is not desirable for all users to see the entire logical model (that is, all the actual relations stored in the database.) n Consider a person who needs to know an instructors name and department, but not the salary. This person should see a relation described, in SQL, by select ID, name, dept_name from instructor n A view provides a mechanism to hide certain data from the view of certain users. n Any relation that is not of the conceptual model but is made visible to a user as a “virtual relation” is called a view. Database System Concepts - 6th Edition 4.12 ©Silberschatz, Korth and Sudarshan View Definition n A view is defined using the create view statement which has the form create view v as < query expression > where <query expression> is any legal SQL expression. The view name is represented by v. n Once a view is defined, the view name can be used to refer to the virtual relation that the view generates. n View definition is not the same as creating a new relation by evaluating the query expression l Rather, a view definition causes the saving of an expression; the expression is substituted into queries using the view. Database System Concepts - 6th Edition 4.13 ©Silberschatz, Korth and Sudarshan Example Views n A view of instructors without their salary create view faculty as select ID, name, dept_name from instructor n Find all instructors in the Biology department select name from faculty where dept_name = ‘Biology’ n Create a view of department salary totals create view departments_total_salary(dept_name, total_salary) as select dept_name, sum (salary) from instructor group by dept_name; Database System Concepts - 6th Edition 4.14 ©Silberschatz, Korth and Sudarshan Views Defined Using Other Views n create view physics_fall_2009 as select course.course_id, sec_id, building, room_number from course, section where course.course_id = section.course_id and course.dept_name = ’Physics’ and section.semester = ’Fall’ and section.year = 2009; n create view physics_fall_2009_watson as select course_id, room_number from physics_fall_2009 where building= ’Watson’; Database System Concepts - 6th Edition 4.15 ©Silberschatz, Korth and Sudarshan View Expansion n Expand use of a view in a query/another view create view physics_fall_2009_watson as select course_id, room_number from (select course.course_id, building, room_number from course, section where course.course_id = section.course_id and course.dept_name = ’Physics’ and section.semester = ’Fall’ and section.year = 2009) where building= ’Watson’; Database System Concepts - 6th Edition 4.16 ©Silberschatz, Korth and Sudarshan Views Defined Using Other Views n One view may be used in the expression defining another view n A view relation v1 is said to depend directly on a view relation v2 if v2 is used in the expression defining v1 n A view relation v1 is said to depend on view relation v2 if either v1 depends directly to v2 or there is a path of dependencies from v1 to v2 n A view relation v is said to be recursive if it depends on itself. Database System Concepts - 6th Edition 4.17 ©Silberschatz, Korth and Sudarshan View Expansion n A way to define the meaning of views defined in terms of other views. n Let view v1 be defined by an expression e1 that may itself contain uses of view relations. n View expansion of an expression repeats the following replacement step: repeat Find any view relation vi in e1 Replace the view relation vi by the expression defining vi until no more view relations are present in e1 n As long as the view definitions are not recursive, this loop will terminate Database System Concepts - 6th Edition 4.18 ©Silberschatz, Korth and Sudarshan Update of a View n Add a new tuple to faculty view which we defined earlier insert into faculty values (’30765’, ’Green’, ’History’); This insertion must be represented by the insertion of the tuple (’30765’, ’Green’, ’History’, null) into the instructor relation Database System Concepts - 6th Edition 4.19 ©Silberschatz, Korth and Sudarshan Some Updates cannot be Translated Uniquely n create view instructor_info as select ID, name, building from instructor, department where instructor.dept_name= department.dept_name; n /*caution: next query causes error*/ insert into instructor_info values (69987, ’White’, ’Taylor’); 4 which 4 what department, if multiple departments in Taylor? if no department is in Taylor? n Most SQL implementations allow updates only on simple views l The from clause has only one database relation. l The select clause contains only attribute names of the relation, and does not have any expressions, aggregates, or distinct specification. l Any attribute not listed in the select clause can be set to null l The query does not have a group by or having clause. Database System Concepts - 6th Edition 4.20 ©Silberschatz, Korth and Sudarshan And Some Not at All n create view history_instructors as select ID, name, dept_name, salary from instructor where dept_name= ’History’; n What happens if we insert into history_instructors values(25566, ’Brown’, ’Biology’, 100000) into history_instructors? n Logical error! Database System Concepts - 6th Edition 4.21 ©Silberschatz, Korth and Sudarshan Materialized Views n Materializing a view: create a physical table containing all the tuples in the result of the query defining the view n If relations used in the query are updated, the materialized view result becomes out of date l Need to maintain the view, by updating the view whenever the underlying relations are updated. create view dbo. history_instructors_materialized with schemabinding as select ID, name, dept_name, salary from dbo.instructor where dept_name= 'History'; Imagine that you have created a view without SCHEMABINDING and you have altered the schema of underlying table (deleted one column). Next time when you run your view, it will fail. Try it with a change in instructors (e.g., name -> surname) Creating a view with SCHEMABINDING option locks the underlying tables and prevents any changes that may change the table schema. Remember that the object should be referred by their two-part name (ownername.objectname) eg: dbo.instructor Database System Concepts - 6th Edition 4.22 ©Silberschatz, Korth and Sudarshan Integrity Constraints n Integrity constraints guard against accidental damage to the database, by ensuring that authorized changes to the database do not result in a loss of data consistency. l A checking account must have a balance greater than $10,000.00 l A salary of a bank employee must be at least $4.00 an hour l A customer must have a (non-null) phone number Database System Concepts - 6th Edition 4.23 ©Silberschatz, Korth and Sudarshan Integrity Constraints on a Single Relation n not null n primary key n unique n check (P), where P is a predicate Database System Concepts - 6th Edition 4.24 ©Silberschatz, Korth and Sudarshan Not Null and Unique Constraints n not null l Declare name and budget to be not null name varchar(20) not null budget numeric(12,2) not null n unique ( A1, A2, …, Am) l The unique specification states that the attributes A1, A2, … Am form a candidate key. l Candidate keys are permitted to be null (in contrast to primary keys). Database System Concepts - 6th Edition 4.25 ©Silberschatz, Korth and Sudarshan The check clause n check (P) where P is a predicate Example: ensure that semester is one of fall, winter, spring or summer: create table section ( course_id varchar (8), sec_id varchar (8), semester varchar (6), year numeric (4,0), building varchar (15), room_number varchar (7), time_slot _id varchar (4), primary key (course_id, sec_id, semester, year), check (semester in (’Fall’, ’Winter’, ’Spring’, ’Summer’)) ); Try it with: insert into section values('105', '1', 'Sommer', 2009, 'Chandler', '375', 'C') Database System Concepts - 6th Edition 4.26 ©Silberschatz, Korth and Sudarshan Referential Integrity n Ensures that a value that appears in one relation for a given set of attributes also appears for a certain set of attributes in another relation. l Example: If “Biology” is a department name appearing in one of the tuples in the instructor relation, then there exists a tuple in the department relation for “Biology”. n Let A be a set of attributes. Let R and S be two relations that contain attributes A and where A is the primary key of S. A is said to be a foreign key of R if for any values of A appearing in R these values also appear in S. Database System Concepts - 6th Edition 4.27 ©Silberschatz, Korth and Sudarshan Cascading Actions in Referential Integrity n create table ref_course ( course_id char(5) primary key, title varchar(20), dept_name varchar(20) references department ) n create table ref_course _cascade( course_id char(5) primary key, title varchar(20), dept_name varchar(20), foreign key (dept_name) references department on delete cascade on update cascade ) n alternative actions to cascade: set null, set default Database System Concepts - 6th Edition 4.28 ©Silberschatz, Korth and Sudarshan Cascading Actions in Referential Integrity Try it: n insert into ref_course values('12345', 'Introduction', ‘Football'); n delete from department where dept_name = 'Athletics' n insert into ref_course_cascade values('54321', 'Black Holes', 'Astronomy'); n delete from department where dept_name = 'Astronomy‘ n select * from ref_course_cascade Database System Concepts - 6th Edition 4.29 ©Silberschatz, Korth and Sudarshan Constraint Violation During Transactions create table person ( ID char(10), name char(40), spouse char(10), primary key ID, foreign key spouse references person) n How to insert tuples without causing constraint violation? Example: we want to insert John and Mary who are married l insert into person values (‘123’, ‘John’, ‘Mary’); l insert into person values (‘321’, ‘Mary’, ‘John’); l set spouse to null initially, update after inserting all persons (not possible if spouse attributes declared to be not null) l OR defer constraint checking with INITIALLY_DEFERRED causes constraints to be checked at the end of a transaction many database implementations do not support deferred constraint checking Database System Concepts - 6th Edition 4.30 ©Silberschatz, Korth and Sudarshan Complex Check Clauses n check constraints with subqueries: check (time_slot_id in (select time_slot_id from time_slot)) l Same as using a foreign key n Check every section has at least one instructor teaching the section: l Set attributes (course id, sec id, semester, year) of section relation as foreign key referencing the corresponding attributes of the teaches relation l (course id, sec id, semester, year) are not a candidate key of teaches relation (we need also the ID of teacher) l check ((course id, sec id, semester, year) in (select course id, sec id, semester, year from teaches)) is a solution n Unfortunately: subquery in check clause not supported by pretty much any database l Alternative: triggers (not covered) n create assertion <assertion-name> check <predicate>; l Also not supported by anyone Database System Concepts - 6th Edition 4.31 ©Silberschatz, Korth and Sudarshan User-Defined Types n create type construct in SQL creates user-defined type create type Dollars as numeric (12,2) final create type Dollars from numeric(12,2) l create table department (dept_name varchar (20), building varchar (15), budget Dollars); Database System Concepts - 6th Edition 4.32 ©Silberschatz, Korth and Sudarshan Domains n create domain construct in SQL-92 creates user-defined domain types create domain person_name char(20) not null create type person_name char(20) not null n Types and domains are similar. Domains can have constraints, such as not null, specified on them. Database System Concepts - 6th Edition 4.33 ©Silberschatz, Korth and Sudarshan Large-Object Types n Large objects (photos, videos, CAD files, etc.) are stored as a large object: l blob: binary large object -- object is a large collection of uninterpreted binary data (whose interpretation is left to an application outside of the database system) l clob: character large object -- object is a large collection of character data l When a query returns a large object, a pointer is returned rather than the large object itself. The SQL Server ntext, text, and image data types are capable of holding extremely large amounts of data, up to 2 GB, in a single value. Database System Concepts - 6th Edition 4.34 ©Silberschatz, Korth and Sudarshan Das Bild kann zurzeit nicht angezeigt werden. End Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use Das Bild kann zurzeit nicht angezeigt werden. Advanced SQL Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use Advanced SQL n Functions and Procedural Constructs n Recursion n Ranking Database System Concepts - 6th Edition 5.2 ©Silberschatz, Korth and Sudarshan Procedural Constructs in SQL Database System Concepts - 6th Edition 5.3 ©Silberschatz, Korth and Sudarshan Procedural Extensions and Stored Procedures n SQL provides a module language l Permits definition of procedures in SQL, with if-then-else statements, for and while loops, etc. n Stored Procedures l Can store procedures in the database l then execute them using the call statement l permit external applications to operate on the database without knowing about internal details Database System Concepts - 6th Edition 5.4 ©Silberschatz, Korth and Sudarshan Functions and Procedures n SQL:1999 supports functions and procedures l Functions/procedures can be written in SQL itself, or in an external programming language. l Functions are particularly useful with specialized data types such as images and geometric objects. 4 Example: functions to check if polygons overlap, or to compare images for similarity. l Some database systems support table-valued functions, which can return a relation as a result. n SQL:1999 also supports a rich set of imperative constructs, including l Loops, if-then-else, assignment n Many databases have proprietary procedural extensions to SQL that differ from SQL:1999. Database System Concepts - 6th Edition 5.5 ©Silberschatz, Korth and Sudarshan SQL Functions n Define a function that, given the name of a department, returns the count of the number of instructors in that department. create function dept_count (dept_name varchar(20)) returns integer begin declare d_count integer; select count (* ) into d_count from instructor where instructor.dept_name = dept_name return d_count; end create function dept_count(@dept_name varchar(20)) returns int as begin declare @d_count int; select @d_count = count (*) from instructor where instructor.dept_name = @dept_name return @d_count; end Database System Concepts - 6th Edition 5.6 ©Silberschatz, Korth and Sudarshan SQL Functions n Find the department name and budget of all departments with more that 12 instructors. select dept_name, budget from department where dept_count (dept_name ) > 1 select dept_name, budget from department where dbo.dept_count (dept_name ) > 1 Database System Concepts - 6th Edition 5.7 ©Silberschatz, Korth and Sudarshan Table Functions n SQL:2003 added functions that return a relation as a result n Example: Return all instructors of a given department create function instructors_of (dept_name char(20) returns table ( ID varchar(5), name varchar(20), dept_name varchar(20), salary numeric(8,2)) return table (select ID, name, dept_name, salary from instructor where instructor.dept_name = instructors_of.dept_name) create function instructors_of(@dept_name varchar(20)) returns @instructors_of_table table (ID varchar(5), name varchar(20), dept_name varchar(20), salary numeric(8,2) ) as begin insert @instructors_of_table select instructor.ID, instructor.name, instructor.dept_name, instructor.salary from instructor where instructor.dept_name = @dept_name; return; end Database System Concepts - 6th Edition 5.8 ©Silberschatz, Korth and Sudarshan Table Functions n Usage select * from table (instructors_of (‘Music’)) select * from instructors_of('Music') Database System Concepts - 6th Edition 5.9 ©Silberschatz, Korth and Sudarshan SQL Procedures n The dept_count function could instead be written as procedure: create procedure dept_count_proc (in dept_name varchar(20), out d_count integer) begin select count(*) into d_count from instructor where instructor.dept_name = dept_count_proc.dept_name end create procedure dept_count_proc @dept_name varchar(20), @d_count int OUTPUT as select @d_count = count(*) from instructor where instructor.dept_name = @dept_name /* optional printing */ print 'The count is ' + RTRIM(CAST(@d_count AS varchar(20))) Database System Concepts - 6th Edition 5.10 ©Silberschatz, Korth and Sudarshan SQL Procedures n Procedures can be invoked either from an SQL procedure or from embedded SQL, using the call statement. declare d_count integer; call dept_count_proc( ‘Physics’, d_count); Procedures and functions can be invoked also from dynamic SQL declare @d_count int EXECUTE dept_count_proc 'Comp. Sci.', @d_count n SQL:1999 allows more than one function/procedure of the same name (called name overloading), as long as the number of arguments differ, or at least the types of the arguments differ Database System Concepts - 6th Edition 5.11 ©Silberschatz, Korth and Sudarshan Procedural Constructs n Warning: most database systems implement their own variant of the standard syntax below l read your system manual to see what works on your system n Compound statement: begin … end, l May contain multiple SQL statements between begin and end. l Local variables can be declared within a compound statements n Whileand repeat statements : declare n integer default 0; while n < 10 do set n = n + 1 end while repeat set n = n – 1 until n = 0 end repeat Database System Concepts - 6th Edition 5.12 ©Silberschatz, Korth and Sudarshan Procedural Constructs (Cont.) n For loop l Permits iteration over all results of a query l Example: declare n integer default 0; for r as select budget from department where dept_name = ‘Music’ do set n = n - r.budget end for Database System Concepts - 6th Edition 5.13 ©Silberschatz, Korth and Sudarshan Procedural Constructs (cont.) n Conditional statements (if-then-else) SQL:1999 also supports a case statement similar to C case statement n Example procedure: registers student after ensuring classroom capacity is not exceeded l Returns 0 on success and -1 if capacity is exceeded l See book for details n Signaling of exception conditions, and declaring handlers for exceptions declare out_of_classroom_seats condition declare exit handler for out_of_classroom_seats begin … .. signal out_of_classroom_seats end l The handler here is exit -- causes enclosing begin..end to be exited l Other actions possible on exception Database System Concepts - 6th Edition 5.14 ©Silberschatz, Korth and Sudarshan External Language Functions/Procedures n SQL:1999 permits the use of functions and procedures written in other languages such as C or C++ n Declaring external language procedures and functions create procedure dept_count_proc(in dept_name varchar(20), out count integer) language C external name ’ /usr/avi/bin/dept_count_proc’ create function dept_count(dept_name varchar(20)) returns integer language C external name ‘/usr/avi/bin/dept_count’ Database System Concepts - 6th Edition 5.15 ©Silberschatz, Korth and Sudarshan External Language Routines (Cont.) n Benefits of external language functions/procedures: l more efficient for many operations, and more expressive power. n Drawbacks l Code to implement function may need to be loaded into database system and executed in the database system’s address space. 4 risk of accidental corruption of database structures 4 security risk, allowing users access to unauthorized data l There are alternatives, which give good security at the cost of potentially worse performance. l Direct execution in the database system’s space is used when efficiency is more important than security. Database System Concepts - 6th Edition 5.16 ©Silberschatz, Korth and Sudarshan Security with External Language Routines n To deal with security problems l Use sandbox techniques 4 that is use a safe language like Java, which cannot be used to access/damage other parts of the database code. l Or, run external language functions/procedures in a separate process, with no access to the database process’ memory. 4 Parameters and results communicated via inter-process communication n Both have performance overheads n Many database systems support both above approaches as well as direct executing in database system address space. Database System Concepts - 6th Edition 5.17 ©Silberschatz, Korth and Sudarshan Recursive Queries Database System Concepts - 6th Edition 5.18 ©Silberschatz, Korth and Sudarshan Recursion in SQL n SQL:1999 permits recursive view definition n Example: find which courses are a prerequisite, whether directly or indirectly, for a specific course with recursive rec_prereq(course_id, prereq_id) as ( select course_id, prereq_id from prereq union select rec_prereq.course_id, prereq.prereq_id, from rec_rereq, prereq where rec_prereq.prereq_id = prereq.course_id ) select ∗ from rec_prereq; This example view, rec_prereq, is called the transitive closure of the prereq relation Note: 1st printing of 6th ed erroneously used c_prereq in place of rec_prereq in some places Database System Concepts - 6th Edition 5.19 ©Silberschatz, Korth and Sudarshan Recursion in SQL with rec_prereq as ( select course_id, prereq_id from prereq union all select rec_prereq.course_id, prereq.prereq_id from rec_prereq inner join prereq on rec_prereq.prereq_id = prereq.course_id ) select * from rec_prereq OPTION (MAXRECURSION 5); Check: course_id = 972, prereq_id = 139 because course_id = 972, prereq_id = 958 and course_id = 958, prereq_id = 139 Uncomment the 2 entries of largeRelationsInsertFile.sql about insertions in prereq. What happens now? Database System Concepts - 6th Edition 5.20 ©Silberschatz, Korth and Sudarshan The Power of Recursion n Recursive views make it possible to write queries, such as transitive closure queries, that cannot be written without recursion or iteration. l Intuition: Without recursion, a non-recursive non-iterative program can perform only a fixed number of joins of prereq with itself 4 This can give only a fixed number of levels of managers 4 Given a fixed non-recursive query, we can construct a database with a greater number of levels of prerequisites on which the query will not work Database System Concepts - 6th Edition 5.21 ©Silberschatz, Korth and Sudarshan Advanced Aggregation Features Database System Concepts - 6th Edition 5.24 ©Silberschatz, Korth and Sudarshan Ranking n Ranking is done in conjunction with an order by specification. n Suppose we are given a relation student_grades(ID, GPA) (see Exercise 2A.2c) giving the grade-point average of each student n Find the rank of each student. select ID, rank() over (order by GPA desc) as s_rank from student_grades n An extra order by clause is needed to get them in sorted order select ID, rank() over (order by GPA desc) as s_rank from student_grades order by s_rank n Ranking may leave gaps: e.g. if 2 students have the same top GPA, both have rank 1, and the next rank is 3 l dense_rank does not leave gaps, so next dense rank would be 2 Database System Concepts - 6th Edition 5.25 ©Silberschatz, Korth and Sudarshan Ranking n Ranking can be done using basic SQL aggregation, but resultant query is very inefficient select ID, (1 + (select count(*) from student_grades B where B.GPA > A.GPA)) as s_rank from student_grades A order by s_rank; the rank of a student is merely 1 plus the number of students with a higher GPA overall time quadratic in the size of the relation Database System Concepts - 6th Edition 5.26 ©Silberschatz, Korth and Sudarshan Ranking (Cont.) create view dept_grades as select student.ID, student.dept_name, student_grades.GPA from student join student_grades on student.ID = student_grades.ID n Ranking can be done within partition of the data. n “Find the rank of students within each department.” select ID, dept_name, rank () over (partition by dept_name order by GPA desc) as dept_rank from dept_grades order by dept_name, dept_rank; n Multiple rank clauses can occur in a single select clause. n Ranking is done after applying group by clause/aggregation n Can be used to find top-n results l More general than the limit n clause supported by many databases, since it allows top-n within each partition select top n Database System Concepts - 6th Edition 5.27 ©Silberschatz, Korth and Sudarshan Ranking (Cont.) n Other ranking functions: l percent_rank (within partition, if partitioning is done) l cume_dist (cumulative distribution) 4 l fraction of tuples with preceding values row_number (non-deterministic in presence of duplicates) n SQL:1999 permits the user to specify nulls first or nulls last select ID, rank ( ) over (order by GPA desc nulls last) as s_rank from student_grades select ID, rank ( ) over (order by (CASE WHEN GPA IS NULL THEN 1.79E+308 ELSE GPA END) desc) as s_rank from student_grades Database System Concepts - 6th Edition 5.28 ©Silberschatz, Korth and Sudarshan Ranking (Cont.) n For a given constant n, the ranking the function ntile(n) takes the tuples in each partition in the specified order, and divides them into n buckets with equal numbers of tuples. n E.g., select ID, ntile(4) over (order by GPA desc) as quartile from student_grades; Database System Concepts - 6th Edition 5.29 ©Silberschatz, Korth and Sudarshan Das Bild kann zurzeit nicht angezeigt werden. End Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use Das Bild kann zurzeit nicht angezeigt werden. Entity-Relationship Model Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use Entity-Relationship Model n Design Process n Modeling n Constraints n E-R Diagram n Design Issues n Weak Entity Sets n Extended E-R Features n Design of the Bank Database n Reduction to Relation Schemas n Database Design Database System Concepts - 6th Edition 7.2 ©Silberschatz, Korth and Sudarshan Modeling n A database can be modeled as: l a collection of entities, l relationship among entities. n An entity is an object that exists and is distinguishable from other objects. l Example: specific person, company, event, plant n Entities have attributes l Example: people have names and addresses n An entity set is a set of entities of the same type that share the same properties. l Example: set of all persons, companies, trees, holidays Database System Concepts - 6th Edition 7.3 ©Silberschatz, Korth and Sudarshan Entity Sets instructor and student instructor_ID instructor_name Database System Concepts - 6th Edition student-ID student_name 7.4 ©Silberschatz, Korth and Sudarshan Relationship Sets n A relationship is an association among several entities Example: 44553 (Peltier) student entity advisor relationship set 22222 (Einstein) instructor entity n A relationship set is a mathematical relation among n ³ 2 entities, each taken from entity sets {(e1, e2, … en) | e1 Î E1, e2 Î E2, …, en Î En} where (e1, e2, …, en) is a relationship l Example: (44553,22222) Î advisor Database System Concepts - 6th Edition 7.5 ©Silberschatz, Korth and Sudarshan Relationship Set advisor Database System Concepts - 6th Edition 7.6 ©Silberschatz, Korth and Sudarshan Relationship Sets (Cont.) n An attribute can also be property of a relationship set. n For instance, the advisor relationship set between entity sets instructor and student may have the attribute date which tracks when the student started being associated with the advisor Database System Concepts - 6th Edition 7.7 ©Silberschatz, Korth and Sudarshan Degree of a Relationship Set n binary relationship l involve two entity sets (or degree two). l most relationship sets in a database system are binary. n Relationships between more than two entity sets are rare. Most relationships are binary. (More on this later.) 4 Example: students work on research projects under the guidance of an instructor. 4 relationship proj_guide is a ternary relationship between instructor, student, and project Database System Concepts - 6th Edition 7.8 ©Silberschatz, Korth and Sudarshan Attributes n An entity is represented by a set of attributes, that is descriptive properties possessed by all members of an entity set. l Example: instructor = (ID, name, street, city, salary ) course= (course_id, title, credits) n Domain – the set of permitted values for each attribute n Attribute types: l Simple and composite attributes. l Single-valued and multivalued attributes 4 l Example: multivalued attribute: phone_numbers Derived attributes 4 Can be computed from other attributes 4 Example: age, given date_of_birth Database System Concepts - 6th Edition 7.9 ©Silberschatz, Korth and Sudarshan Composite Attributes Database System Concepts - 6th Edition 7.10 ©Silberschatz, Korth and Sudarshan Mapping Cardinality Constraints n Express the number of entities to which another entity can be associated via a relationship set. n Most useful in describing binary relationship sets. n For a binary relationship set the mapping cardinality must be one of the following types: l One to one l One to many l Many to one l Many to many Database System Concepts - 6th Edition 7.11 ©Silberschatz, Korth and Sudarshan Mapping Cardinalities One to many One to one Note: Some elements in A and B may not be mapped to any elements in the other set Database System Concepts - 6th Edition 7.12 ©Silberschatz, Korth and Sudarshan Mapping Cardinalities Many to one Many to many Note: Some elements in A and B may not be mapped to any elements in the other set Database System Concepts - 6th Edition 7.13 ©Silberschatz, Korth and Sudarshan Keys n A super key of an entity set is a set of one or more attributes whose values uniquely determine each entity. n A candidate key of an entity set is a minimal super key l ID is candidate key of instructor l course_id is candidate key of course n Although several candidate keys may exist, one of the candidate keys is selected to be the primary key. Database System Concepts - 6th Edition 7.14 ©Silberschatz, Korth and Sudarshan Keys for Relationship Sets n The combination of primary keys of the participating entity sets forms a super key of a relationship set. l (s_id, i_id) is the super key of advisor l NOTE: this means a pair of entity sets can have at most one relationship in a particular relationship set. 4 Example: if we wish to track multiple meeting dates between a student and her advisor, we cannot assume a relationship for each meeting. We can use a multivalued attribute though n Must consider the mapping cardinality of the relationship set when deciding what are the candidate keys n Need to consider semantics of relationship set in selecting the primary key in case of more than one candidate key Database System Concepts - 6th Edition 7.15 ©Silberschatz, Korth and Sudarshan Redundant Attributes n Suppose we have entity sets l instructor, with attributes including dept_name l department and a relationship l inst_dept relating instructor and department n Attribute dept_name in entity instructor is redundant since there is an explicit relationship inst_dept which relates instructors to departments l The attribute replicates information present in the relationship, and should be removed from instructor l BUT: when converting back to tables, in some cases the attribute gets reintroduced, as we will see. Database System Concepts - 6th Edition 7.16 ©Silberschatz, Korth and Sudarshan E-R Diagrams n Rectangles represent entity sets. n Diamonds represent relationship sets. n Attributes listed inside entity rectangle n Underline indicates primary key attributes Database System Concepts - 6th Edition 7.17 ©Silberschatz, Korth and Sudarshan Entity With Composite, Multivalued, and Derived Attributes Database System Concepts - 6th Edition 7.18 ©Silberschatz, Korth and Sudarshan Relationship Sets with Attributes Database System Concepts - 6th Edition 7.19 ©Silberschatz, Korth and Sudarshan Roles n Entity sets of a relationship need not be distinct l Each occurrence of an entity set plays a “role” in the relationship n The labels “course_id” and “prereq_id” are called roles. Database System Concepts - 6th Edition 7.20 ©Silberschatz, Korth and Sudarshan Cardinality Constraints n We express cardinality constraints by drawing either a directed line (®), signifying “one,” or an undirected line (—), signifying “many,” between the relationship set and the entity set. n One-to-one relationship: l A student is associated with at most one instructor via the relationship advisor l A student is associated with at most one department via stud_dept Database System Concepts - 6th Edition 7.21 ©Silberschatz, Korth and Sudarshan One-to-One Relationship n one-to-one relationship between an instructor and a student l an instructor is associated with at most one student via advisor l and a student is associated with at most one instructor via advisor Database System Concepts - 6th Edition 7.22 ©Silberschatz, Korth and Sudarshan One-to-Many Relationship n one-to-many relationship between an instructor and a student l an instructor is associated with several (including 0) students via advisor l a student is associated with at most one instructor via advisor, Database System Concepts - 6th Edition 7.23 ©Silberschatz, Korth and Sudarshan Many-to-One Relationships n In a many-to-one relationship between an instructor and a student, l an instructor is associated with at most one student via advisor, l and a student is associated with several (including 0) instructors via advisor Database System Concepts - 6th Edition 7.24 ©Silberschatz, Korth and Sudarshan Many-to-Many Relationship n An instructor is associated with several (possibly 0) students via advisor n A student is associated with several (possibly 0) instructors via advisor Database System Concepts - 6th Edition 7.25 ©Silberschatz, Korth and Sudarshan Participation of an Entity Set in a Relationship Set n Total participation (indicated by double line): every entity in the entity set participates in at least one relationship in the relationship set l E.g., participation of section in sec_course is total 4 every section must have an associated course n Partial participation: some entities may not participate in any relationship in the relationship set l Example: participation of instructor in advisor is partial Database System Concepts - 6th Edition 7.26 ©Silberschatz, Korth and Sudarshan Alternative Notation for Cardinality Limits n Cardinality limits can also express participation constraints Database System Concepts - 6th Edition 7.27 ©Silberschatz, Korth and Sudarshan E-R Diagram with a Ternary Relationship Database System Concepts - 6th Edition 7.28 ©Silberschatz, Korth and Sudarshan Cardinality Constraints on Ternary Relationship n We allow at most one arrow out of a ternary (or greater degree) relationship to indicate a cardinality constraint n E.g., an arrow from proj_guide to instructor indicates each student has at most one guide for a project n If there is more than one arrow, there are two ways of defining the meaning. l E.g., a ternary relationship R between A, B and C with arrows to B and C could mean 1. each A entity is associated with a unique entity from B and C or 2. each pair of entities from (A, B) is associated with a unique C entity, and each pair (A, C) is associated with a unique B l Each alternative has been used in different formalisms l To avoid confusion we outlaw more than one arrow Database System Concepts - 6th Edition 7.29 ©Silberschatz, Korth and Sudarshan Weak Entity Sets n An entity set that does not have a primary key is referred to as a weak entity set. n The existence of a weak entity set depends on the existence of a identifying entity set l It must relate to the identifying entity set via a total, one-to-many relationship set from the identifying to the weak entity set l Identifying relationship depicted using a double diamond n The discriminator (or partial key) of a weak entity set is the set of attributes that distinguishes among all the entities of a weak entity set for a given strong entity (e.g., insured childrens’ first name) n The primary key of a weak entity set is formed by the primary key of the strong entity set on which the weak entity set is existence dependent, plus the weak entity set’s discriminator. Database System Concepts - 6th Edition 7.30 ©Silberschatz, Korth and Sudarshan Weak Entity Sets (Cont.) n We underline the discriminator of a weak entity set with a dashed line. n We put the identifying relationship of a weak entity in a double diamond. n Primary key for section – (course_id, sec_id, semester, year) Database System Concepts - 6th Edition 7.31 ©Silberschatz, Korth and Sudarshan Weak Entity Sets (Cont.) n Note: the primary key of the strong entity set is not explicitly stored with the weak entity set, since it is implicit in the identifying relationship. n If course_id were explicitly stored, section could be made a strong entity, but then the relationship between section and course would be duplicated by an implicit relationship defined by the attribute course_id common to course and section Database System Concepts - 6th Edition 7.32 ©Silberschatz, Korth and Sudarshan E-R Diagram for a University Enterprise Database System Concepts - 6th Edition 7.33 ©Silberschatz, Korth and Sudarshan Symbols used in the E-R notation. Database System Concepts - 6th Edition 7.34 ©Silberschatz, Korth and Sudarshan Reduction to Relational Schemas Database System Concepts - 6th Edition 7.35 ©Silberschatz, Korth and Sudarshan Reduction to Relation Schemas n Entity sets and relationship sets can be expressed uniformly as relation schemas that represent the contents of the database. n A database which conforms to an E-R diagram can be represented by a collection of schemas. n For each entity set and relationship set there is a unique schema that is assigned the name of the corresponding entity set or relationship set. n Each schema has a number of columns (generally corresponding to attributes), which have unique names. Database System Concepts - 6th Edition 7.36 ©Silberschatz, Korth and Sudarshan Representing Entity Sets With Simple Attributes n A strong entity set reduces to a schema with the same attributes student(ID, name, tot_cred) n A weak entity set becomes a table that includes a column for the primary key of the identifying strong entity set section ( course_id, sec_id, sem, year ) Database System Concepts - 6th Edition 7.37 ©Silberschatz, Korth and Sudarshan Representing Relationship Sets n A many-to-many relationship set is represented as a schema with attributes for the primary keys of the two participating entity sets, and any descriptive attributes of the relationship set. n Example: schema for relationship set advisor advisor = (s_id, i_id) Database System Concepts - 6th Edition 7.38 ©Silberschatz, Korth and Sudarshan Redundancy of Schemas n Many-to-one and one-to-many relationship sets that are total on the many-side can be represented by adding an extra attribute to the “many” side, containing the primary key of the “one” side n Example: Instead of creating a schema for relationship set inst_dept, add an attribute dept_name to the schema arising from entity set instructor Database System Concepts - 6th Edition 7.39 ©Silberschatz, Korth and Sudarshan Redundancy of Schemas (Cont.) n For one-to-one relationship sets, either side can be chosen to act as the “many” side l That is, extra attribute can be added to either of the tables corresponding to the two entity sets n If participation is partial on the “many” side, replacing a schema by an extra attribute in the schema corresponding to the “many” side could result in null values n The schema corresponding to a relationship set linking a weak entity set to its identifying strong entity set is redundant. l Example: The section schema already contains the attributes that would appear in the sec_course schema Database System Concepts - 6th Edition 7.40 ©Silberschatz, Korth and Sudarshan Composite and Multivalued Attributes n Composite attributes are flattened out by creating a separate attribute for each component attribute l Example: given entity set instructor with composite attribute name with component attributes first_name and last_name the schema corresponding to the entity set has two attributes name_first_name and name_last_name 4 Prefix omitted if there is no ambiguity n Ignoring multivalued attributes, extended instructor schema is l Database System Concepts - 6th Edition instructor(ID, first_name, middle_initial, last_name, street_number, street_name, apt_number, city, state, zip_code, date_of_birth) 7.41 ©Silberschatz, Korth and Sudarshan Composite and Multivalued Attributes n A multivalued attribute M of an entity E is represented by a separate schema EM l Schema EM has attributes corresponding to the primary key of E and an attribute corresponding to multivalued attribute M l Example: Multivalued attribute phone_number of instructor is represented by a schema: inst_phone= ( ID, phone_number) l Each value of the multivalued attribute maps to a separate tuple of the relation on schema EM 4 For example, an instructor entity with primary key 22222 and phone numbers 456-7890 and 123-4567 maps to two tuples: (22222, 456-7890) and (22222, 123-4567) Database System Concepts - 6th Edition 7.42 ©Silberschatz, Korth and Sudarshan Multivalued Attributes (Cont.) n Special case:entity time_slot has only one attribute other than the primary-key attribute, and that attribute is multivalued l Optimization: Don’t create the relation corresponding to the entity, just create the one corresponding to the multivalued attribute l time_slot(time_slot_id, day, start_time, end_time) l Caveat: time_slot attribute of section (from sec_time_slot) cannot be a foreign key due to this optimization Database System Concepts - 6th Edition 7.43 ©Silberschatz, Korth and Sudarshan Design Issues n Use of entity sets vs. attributes n Use of phone as an entity allows extra information about phone numbers (plus multiple phone numbers) Database System Concepts - 6th Edition 7.44 ©Silberschatz, Korth and Sudarshan Design Issues n Use of entity sets vs. relationship sets Possible guideline is to designate a relationship set to describe an action that occurs between entities Database System Concepts - 6th Edition 7.45 ©Silberschatz, Korth and Sudarshan Design Issues n Binary versus n-ary relationship sets Although it is possible to replace any nonbinary (n-ary, for n > 2) relationship set by a number of distinct binary relationship sets, a n-ary relationship set shows more clearly that several entities participate in a single relationship. n Placement of relationship attributes e.g., attribute date as attribute of advisor or as attribute of student Database System Concepts - 6th Edition 7.46 ©Silberschatz, Korth and Sudarshan Binary Vs. Non-Binary Relationships n Some relationships that appear to be non-binary may be better represented using binary relationships l E.g., A ternary relationship parents, relating a child to his/her father and mother, is best replaced by two binary relationships, father and mother 4 l Using two binary relationships allows partial information (e.g., only mother being know) But there are some relationships that are naturally non-binary 4 Example: proj_guide Database System Concepts - 6th Edition 7.47 ©Silberschatz, Korth and Sudarshan Converting Non-Binary Relationships to Binary Form n In general, any non-binary relationship can be represented using binary relationships by creating an artificial entity set. l Replace R between entity sets A, B and C by an entity set E, and three relationship sets: 1. RA, relating E and A 2. RB, relating E and B 3. RC, relating E and C l Create a special identifying attribute for E l Add any attributes of R to E l For each relationship (ai , bi , ci) in R, create 1. a new entity ei in the entity set E 2. add (ei , ai ) to RA 3. add (ei , bi ) to RB 4. add (ei , ci ) to RC Database System Concepts - 6th Edition 7.48 ©Silberschatz, Korth and Sudarshan Converting Non-Binary Relationships (Cont.) n Also need to translate constraints l Translating all constraints may not be possible l There may be instances in the translated schema that cannot correspond to any instance of R 4 l Exercise: add constraints to the relationships RA, RB and RC to ensure that a newly created entity corresponds to exactly one entity in each of entity sets A, B and C We can avoid creating an identifying attribute by making E a weak entity set (described shortly) identified by the three relationship sets Database System Concepts - 6th Edition 7.49 ©Silberschatz, Korth and Sudarshan Extended ER Features Database System Concepts - 6th Edition 7.50 ©Silberschatz, Korth and Sudarshan Extended E-R Features: Specialization n Top-down design process; we designate subgroupings within an entity set that are distinctive from other entities in the set. n These subgroupings become lower-level entity sets that have attributes or participate in relationships that do not apply to the higher-level entity set. n Depicted by a triangle component labeled ISA (E.g., instructor “is a” person). n Attribute inheritance – a lower-level entity set inherits all the attributes and relationship participation of the higher-level entity set to which it is linked. Database System Concepts - 6th Edition 7.51 ©Silberschatz, Korth and Sudarshan Specialization Example Database System Concepts - 6th Edition 7.52 ©Silberschatz, Korth and Sudarshan Extended ER Features: Generalization n A bottom-up design process – combine a number of entity sets that share the same features into a higher-level entity set. n Specialization and generalization are simple inversions of each other; they are represented in an E-R diagram in the same way. n The terms specialization and generalization are used interchangeably. Database System Concepts - 6th Edition 7.53 ©Silberschatz, Korth and Sudarshan Specialization and Generalization (Cont.) n Can have multiple specializations of an entity set based on different features. n E.g., permanent_employee vs. temporary_employee, in addition to instructor vs. secretary n Each particular employee would be l a member of one of permanent_employee or temporary_employee, l and also a member of one of instructor, secretary n The ISA relationship also referred to as superclass - subclass relationship Database System Concepts - 6th Edition 7.54 ©Silberschatz, Korth and Sudarshan Design Constraints on a Specialization/Generalization n Constraint on which entities can be members of a given lower-level entity set. l condition-defined 4 l Example: all customers over 65 years are members of seniorcitizen entity set; senior-citizen ISA person. user-defined n Constraint on whether or not entities may belong to more than one lower- level entity set within a single generalization. l l Disjoint 4 an entity can belong to only one lower-level entity set 4 Noted in E-R diagram by having multiple lower-level entity sets link to the same triangle Overlapping 4 an entity can belong to more than one lower-level entity set Database System Concepts - 6th Edition 7.55 ©Silberschatz, Korth and Sudarshan Design Constraints on a Specialization/Generalization (Cont.) n Completeness constraint -- specifies whether or not an entity in the higher-level entity set must belong to at least one of the lowerlevel entity sets within a generalization. l total: an entity must belong to one of the lower-level entity sets 4 l adding the keyword “total” in the diagram partial: an entity need not belong to one of the lower-level entity sets Database System Concepts - 6th Edition 7.56 ©Silberschatz, Korth and Sudarshan Aggregation n Consider the ternary relationship proj_guide, which we saw earlier n Suppose we want to record evaluations of a student by a guide on a project Database System Concepts - 6th Edition 7.57 ©Silberschatz, Korth and Sudarshan Aggregation (Cont.) n Relationship sets eval_for and proj_guide represent overlapping information l Every eval_for relationship corresponds to a proj_guide relationship l However, some proj_guide relationships may not correspond to any eval_for relationships 4 So we can’t discard the proj_guide relationship n Eliminate this redundancy via aggregation l Treat relationship as an abstract entity l Allows relationships between relationships l Abstraction of relationship into new entity Database System Concepts - 6th Edition 7.58 ©Silberschatz, Korth and Sudarshan Aggregation (Cont.) n Without introducing redundancy, the following diagram represents: l A student is guided by a particular instructor on a particular project l A student, instructor, project combination may have an associated evaluation Database System Concepts - 6th Edition 7.59 ©Silberschatz, Korth and Sudarshan Representing Specialization via Schemas n Method 1: l Form a schema for the higher-level entity l Form a schema for each lower-level entity set, include primary key of higher-level entity set and local attributes schema person student employee l attributes ID, name, street, city ID, tot_cred ID, salary Drawback: getting information about, an employee requires accessing two relations, the one corresponding to the low-level schema and the one corresponding to the high-level schema Database System Concepts - 6th Edition 7.60 ©Silberschatz, Korth and Sudarshan Representing Specialization as Schemas (Cont.) n Method 2: l Form a schema for each entity set with all local and inherited attributes schema attributes person ID, name, street, city student ID, name, street, city, tot_cred employee ID, name, street, city, salary l If specialization is total, the schema for the generalized entity set (person) not required to store information 4 l Can be defined as a “view” relation containing union of specialization relations Drawback: name, street and city may be stored redundantly for people who are both students and employees Database System Concepts - 6th Edition 7.61 ©Silberschatz, Korth and Sudarshan Schemas Corresponding to Aggregation (Cont.) n To represent aggregation, create a schema containing l primary key of the aggregated relationship, l the primary key of the associated entity set l any descriptive attributes n The schema for the relationship set eval_for between the aggregation of proj_guide and the entity set evaluation includes: 1. An attribute for each attribute in the primary keys of the entity set evaluation, and the relationship set proj_guide. 2. It also includes an attribute for any descriptive attributes, if they exist, of the relationship set eval_for. n We then transform the relationship sets and entity sets within the aggregated entity set following the rules we have already defined. Database System Concepts - 6th Edition 7.62 ©Silberschatz, Korth and Sudarshan Summary of E-R Design Decisions n The use of an attribute or entity set to represent an object. n Whether a real-world concept is best expressed by an entity set or a relationship set. n The use of a ternary relationship versus a pair of binary relationships. n The use of a strong or weak entity set. n The use of specialization/generalization – contributes to modularity in the design. n The use of aggregation – can treat the aggregate entity set as a single unit without concern for the details of its internal structure. Database System Concepts - 6th Edition 7.63 ©Silberschatz, Korth and Sudarshan Das Bild kann zurzeit nicht angezeigt werden. End Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use Das Bild kann zurzeit nicht angezeigt werden. Relational Database Design Normal Forms n Normal forms defined in relational database theory represent guidelines for record design. n Presentation conveys an intuitive sense of the intended constraints on record design l Its informality it may be imprecise in some technical details n Normalization rules: l are designed to prevent update anomalies and data inconsistencies l tend to penalize retrieval efficiency, since data which may have been retrievable from one record in an unnormalized design may have to be retrieved from several records in the normalized form n No obligation to fully normalize all records when actual performance requirements are taken into account Presentation follows the article: William Kent, "A Simple Guide to Five Normal Forms in Relational Database Theory", Communications of the ACM 8.2 1-NF EmpID n First normal form deals with the PrjID E1 P1 "shape" of a record type E2 P2 n Under first normal form, all E3 occurrences of a record type must contain the same number of fields n First normal form excludes variable repeating fields and groups P2 P3 P3 Relation not in 1-NF EmpID PrjID E1 P1 E1 P2 E2 P2 E2 P3 E2 P4 E3 P3 Relation in 1-NF 8.3 P4 2-NF n Under second (and third) normal form, a non-key field must provide a fact: l about l us the key, the whole key, and l nothing but the key n In addition, the record must satisfy 1-NF 8.4 2-NF n 2-NF is violated when a non-key field is a fact about a subset of a key (when the key is composite) n Consider the following inventory schema of an online book store: n Inventory(BookID, Warehouse, Quantity, Warehouse-Address) n Inventory is not in 2-NF: l Why? Key is composite (BookID, Warehouse) but Warehouse-Address is a fact about Warehouse alone 8.5 2-NF n Problems by violating 2-NF: l The warehouse address is repeated in every record that refers to a book stored in that warehouse l If the address of the warehouse changes, every record referring to a book stored in that warehouse must be updated l Because of the redundancy, the data might become inconsistent, with different records showing different addresses for the same warehouse. l If at some point in time there are no books stored in the warehouse, there may be no record in which to keep the warehouse's address 8.6 2-NF n To satisfy 2-NF, the schema: Inventory(BookID, Warehouse, Quantity, Warehouse-Address) should be decomposed into (replaced by) the two records: l Stocking(BookID, Warehouse, Quantity) and l Warehouse(Warehouse, Warehouse-Address) n When replacing unnormalized schemas with normalized schemas, the process is referred to as normalization (in this case: 2-NF normalization) 8.7 2-NF n Normalized design enhances the integrity of the data, by minimizing redundancy and inconsistency n But at performance cost for retrieval l Assume we want the addresses of all warehouses stocking a certain book: 4In the unnormalized form we searches one table 4With the normalized design we have to join two tables and search the appropriate pairs 8.8 3-NF n 3-NF is violated when a non-key field is a fact about another non-key field n Consider the schema Works(EmpID, DepartmentID, Location) l EmpID l Each is the primary key department is located in one place l Location field is (in addition to EmpID) also a fact about the DepartmentID, which is not the key 8.9 3-NF n Problems by violating 3-NF: l The department's location is repeated in the record of every employee assigned to that department l If the location of the department changes, every such record must be updated l Because of the redundancy, the data might become inconsistent, with different records showing different locations for the same department l If a department has no employees, there may be no record in which to keep the department's location 8.10 3-NF n To satisfy 3-NF the schema Works(EmpID, Department, Location) should be decomposed into the two records: Works(EmpID, DepartmentID) Department(DepartmentID, Location) n The 2 schemas are in 2-NF and 3-NF, because every field is either: l part of the key or l provides a (single-valued) fact about exactly the whole key and nothing else 8.11 Functional Dependencies n In relational database theory, 2-NF and 3-NF are defined in terms of functional dependencies n A field Y is "functionally dependent" on a field (or fields) X, if it is invalid to have two records with the same X-value but different Y-values l a given X-value must always occur with the same Yvalue n When X is a key, then all fields are by definition functionally dependent on X in a trivial way n 2-NF and 3-NF do not allow any functional dependencies in all other (non-trivial) cases 8.12 Functional Dependencies n Functional dependencies only exist when the things involved have unique and singular identifiers n Example: l Suppose a person has only one address l If we don't provide unique identifiers for people, then there will not be a functional dependency: Person Address John Smith 123 Main St., New York John Smith 321 Center St., San Francisco l Although each person has a unique address, a given name can appear with several different addresses (different persons with same name) l Non unique identifier precludes functional dependency 8.13 Functional Dependencies n Another example: l the address has to be spelled identically (i.e., be unique as identifier) Person Address John Smith 123 Main St., New York John Smith 123 Main Street, NYC l The same person appears to be living at two different addresses l Non unique identifier precludes a functional dependency 8.14 Functional Dependencies n Therefore, even when we assume that Employee is uniquely identified by name (reasonable for small firms), the instance of relation: Employee Father Father’s Address Art Smith John Smith 123 Main St., New York Bob Smith John Smith 123 Main Street, NYC Cal Smith John Smith 321 Center St., San Francisco does not violate 3-NF l Father’s cannot be assumed as unique identifier l Father’s address is not a unique identifier due to misspellings 8.15 4-NF and 5-NF n 4-NF and 5-NF deal with multi-valued: l may correspond to a many-to-many relationship 4E.g., employees and skills (an employee may have many skills) l or to a many-to-one relationship 4E.g., the children of an employee (assuming only one parent is an employee) 8.16 4-NF n Under 4-NF, a schema should not contain two or more independent multi-valued facts about an entity n In addition, the schema must satisfy 3- NF l The term "independent" will be defined in the next slide 8.17 4-NF n Example schema: l Employees, skills, and languages, where an employee may have several skills and speak several languages ESL(Emp, Skill, Lang) n ESL violates 4-NF n Why? Skill and Lang are independent l A skill of an employee does not depend (no direct connection) in any way on any language l only an indirect connection because they belong to some common employee 8.18 4-NF n Problem by violating 4-NF: leads to uncertainties in the relational representation Emp Skill Smith Smith Language Emp Skill Language Emp Skill Language cook Smith cook French Smith cook French speak Smith speak German Smith speak German Smith speak Spanish Smith null Spanish Smith French Smith German Spanish disjoint format Minimal number of records with repetitions Emp Skill Language Smith cook French Smith cook German Smith cook Spanish Smith speak French Smith speak German Smith speak Spanish 8.19 Minimal number of records with null values A "cross-product" form 4-NF n Other problems caused by violating 4-NF: l If there are repetitions, then updates have to be done in multiple records, and they could become inconsistent l Insertion of a new skill may involve looking for a record with a blank skill, or inserting a new record with a possibly blank language, or inserting multiple records pairing the new skill with some or all of the languages l Deletion of a skill may involve blanking out the skill field in one or more records (perhaps with a check that this doesn't leave two records with the same language and a blank skill), or deleting one or more records, coupled with a check that the last mention of some language hasn't also been deleted 8.20 4-NF n 4-NF minimizes such update problems n Decompose ESL(Emp, Skill, Lang) into ES(Emp, Skill) and EL(Emp, Lang) 8.21 4-NF n What about ternary relationships? Does 4-NF means that we have to always decompose into 2-way relationships? Emp Skill Language Smith cook French Smith speak German Smith speak Spanish n No! Ternary relationship does not violate 4-NF Skill and Language are not independent n In a ternary relationship the facts are not independent n Assume there is direct connection between skill and language l Skill is performed in a specific language 4 E.g., cook French cuisine 8.22 5-NF n 5-NF deals with cases where information can be reconstructed from smaller pieces of information that can be maintained with less redundancy n 2-NF, 3-NF, and 4-NF also serve this purpose, but 5-NF generalizes to cases not covered by the others n No comprehensive exposition, but illustrate central concept with example 8.23 5-NF n Example: l agents represent companies l companies make products l agents sell products l record which agent sells which product for which company l Agent Comp Product Smith Ford Car Smith GM Truck Notice that: 4 Smith does not sell Ford trucks or GM cars 4 need the combination of three fields to know which combinations are valid 8.24 5-NF n Assume the rule: l if an agent sells a certain product type, l and he represents a company making that product type, l then he sells the products of this type made by this company n Example facts: Agent Comp Product l Ford and GM produce cars and trucks Smith Ford Car l Smith sells cars and trucks, Jones sells only cars Smith Ford Truck Smith GM Car Smith represents Ford and GM, Jones represents Ford Smith GM Truck Jones Ford Car l 8.25 5-NF n But we can reconstruct all the true facts from a normalized form consisting of three separate schemas, each containing two fields: Smith represents Ford and GM, Jones represents Ford Ford and GM produce cars and trucks Smith sells cars and trucks, Jones sells only cars Comp Product Agent Product Agent Comp Ford Car Smith Car Smith Ford Ford Truck Smith Truck Smith GM GM Car Jones Car Jones Ford GM Truck n These three schemas are in 5-NF, whereas the corresponding three- field schema (previous slide) is not n A schema is in 5-NF when its information content cannot be reconstructed from schemas each having fewer fields (exclude the case where all smaller schemas have the same key) 8.26 5-NF n Notice: 5-NF does not differ from 4-NF unless there exists a symmetric constraint (such as the rule about agents, companies, and products) l when no such a constraint, a schema in 4-NF is always in 5-NF also n Advantage of 5-NF: l certain redundancies can be eliminated l the fact that Smith sells cars is recorded only once; l in the unnormalized form it may be repeated many times Agent Product Agent Comp Product Smith Car Smith Ford Car Smith Truck Smith Ford Truck Jones Car Smith GM Car Smith GM Truck Jones Ford Car 8.27 Exercise n The Denormalized table l stores data for products purchased by people online l This database also stores their employer information 4 assume that a person can only have one employer SSN User Product1 Name 332345432 Amy M Product2 More Products Employer Name Google Employer Address 1 California drive 666666666 Kevin A 919919919 Raj D B Facebook Google 22nd Street Sanfrancisco 1 California drive C,D Database Normalization Tutorial with example http://dotnetanalysis.blogspot.de/2012/01/database-normalization-sql-server.html 8.28 Exercise: 1-NF n Only one value in a column n No multiple columns for a one-to-many relationship n Which problems do you see in the previous table? SSN User Name 332345432 Amy Employer Name Google Employer Address 1 California drive Product 666666666 Kevin Facebook A 666666666 Kevin Facebook 666666666 Kevin Facebook 666666666 Kevin Facebook 22nd Street Sanfrancisco 22nd Street Sanfrancisco 22nd Street Sanfrancisco 22nd Street Sanfrancisco 919919919 Raj Google 1 California drive D M B C D SSN and Product together have been chosen as the primary key 8.29 Exercise: 2-NF n All the non primary key columns in the table should depend on the entire primary key n Which problems do you see in the previous table? l The UserName column does not depend on the entire primary key. It only depends on a part of the primary key (SSN) l EmployerName and EmployerAddress does not depend on the entire primary key. They only depend on a part of the primary key (SSN) SSN 332345432 666666666 919919919 User Name Amy Kevin Raj SSN Employer Name 332345432 Google 666666666 Facebook 919919919 Google Employer Address 1 California drive 22nd Street Sanfrancisco 1 California drive SSN 332345432 666666666 666666666 666666666 666666666 919919919 In 2-NF every column is dependent on the entire primary key in that table and not part of the primary key 8.30 Product M A B C D D Exercise: 3-NF n No indirect dependency between non-key fields n Which problems do you see in the previous tables? l SSN 332345432 666666666 919919919 EmployerAddress depends on EmployerName User Name Amy Kevin Raj SSN 33234543 2 66666666 6 91991991 9 Employer Name Google Employer Name Google Facebook Facebook Google 8.31 Employer Address 1 California drive 22nd Street Sanfrancisco SSN 332345432 666666666 666666666 666666666 666666666 919919919 Product M A B C D D Summary of Design Process n An initial set of data elements and records has to be developed, as candidates for normalization n Then the factors affecting normalization have to be assessed: l Single-valued vs. multi-valued facts l Dependency on the entire key l Independent vs. dependent facts l The presence of mutual constraints l The presence of non-unique or non-singular representations l And, finally, the desirability of normalization has to be assessed, in terms of its performance impact on retrieval applications 8.32 Das Bild kann zurzeit nicht angezeigt werden. End