CSMI14: Database Management Systems Dr. R. Bala Krishnan Asst. Prof. Dept. of CSE NIT, Trichy – 620 015 Ph: 999 470 4853 E-mail: balakrishnan@nitt.edu Course Content 2 Course Content 3 Books • Text Books (TB) Silberschatz, Henry F. Korth, S. Sudharshan, “Database System Concepts”, Fifth Edition, Tata McGraw Hill, 2006. J. Date, A. Kannan, S. Swamynathan, “An Introduction to Database Systems”, Eighth Edition, Pearson Education, 2006. • Reference Books (RB) Ramez Elmasri, Shamkant B. Navathe, “Fundamentals of Database Systems”, Fourth Edition, Pearson/Addision Wesley, 2007. Raghu Ramakrishnan, “Database Management Systems”, Third Edition, McGraw Hill, 2003. S. K. Singh, “Database Systems Concepts, Design and Applications”, First Edition, Pearson Education, 2006. 4 Books & Chapters Unit Book Chapter 1 TB1 1 1_ TB1 6 2 RB2 3, 4 2_ TB1 3 3 RB2 19 4 RB2 16, 17, 18 5 TB1 11, 12 • https://www.databasestar.com/sql-practice/ • http://sqlfiddle.com/#!9/7379d5/1 5 Unit II 6 RDBMS • Relational model is very simple and elegant • A database is a collection of one or more relations -> Each relation is a table with rows and columns - Simple tabular representation enables even novice users to understand the contents of a database • • Permits the use of simple, high-level languages to query the data Advantage - Simple data representation - Even complex queries can be expressed easily Data Definition Language (DDL) -> Standard language for creating, manipulating, and querying data in a relational DBMS 7 RDBMS • Main construct for representing data in the relational model is a relation Student • • • • • ID Name Department Year 123 Bala CSE 1 456 Krishnan EEE 2 789 Karthik CSE 1 Relation consists of a relation schema and a relation instance Relation schema describes the column heads for the table and relation instance is a table Schema specifies the relation's name, the name of each field (or column, or attribute), and the domain of each field Domain is referred to in a relation schema by the domain name and has a set of associated values Set of values associated with domain string is the set of all character strings 8 • • RDBMS An instance of a relation is a set of tuples/records, in which each tuple has the same number of fields as the relation schema A relation instance can be thought of as a table in which each tuple is a row, and all rows have the same number of fields Students • • • • Order in which the rows are listed is not important If the fields are named, as in our schema definitions and figures depicting relation instances, the order of fields does not matter either An alternative convention is to list fields in a specific order and refer to a field by its position. Eg: sid -> 1; login -> 3; Order is important In SQL, named fields convention is used in statements that retrieve tuples and the ordered fields convention is commonly used when inserting tuples 9 • • • • • • • RDBMS Domain Constraint - A relation schema specifies the domain of each field or column in the relation instance - Values that appear in a column must be drawn from the domain associated with that column Domain constraints are so fundamental -> relation instance means relation instance that satisfies the domain constraints in the relation schema Degree, also called arity, of a relation is the number of fields Cardinality of a relation instance is the number of tuples in it A relational database is a collection of relations with distinct relation names - University database -> Students, Faculty, Courses, Rooms, Enrolled, Teaches, and Meets_In Relational database schema is the collection of schemas for the relations in the database An instance of a relational database is a collection of relation instances, one per relation schema in the database schema 10 Creating and Modifying Relations • • SQL language standard uses the word table to denote relation Subset of SQL that supports the creation, deletion, and modification of tables is called the Data Definition Language (DDL) sid name login age gpa sid name login age gpa 53688 Smith smith@ee 18 3.2 sid name login sid name login age gpa 53688 Smith smith@ee 19 2.2 age gpa 11 Creating and Modifying Relations Students 12 Creating and Modifying Relations Students Students 13 Integrity Constraints over Relations • • • • • • • A database is only as good as the information stored in it DBMS must help prevent the entry of incorrect information An integrity constraint is a condition specified on a database schema and restricts the data that can be stored in an instance of the database If a database instance satisfies all the integrity constraints specified on the database schema, it is a legal instance DBMS enforces integrity constraints -> Permits only legal instances to be stored in the database Integrity constraints are specified and enforced at different times - When the DBA or end user defines a database schema, he or she specifies the constraints that must hold on any instance of this database - When a database application is run, the DBMS checks for violations and disallows changes to the data that violate the specified Ics Discuss the integrity constraints, other than domain constraints, that a DBA or user can specify in the relational model 14 Integrity Constraints over Relations • Domain Constraint • Primary Key Constraint • Foreign Key Constraint 15 Key Constraints • • Consider the Students relation and the constraint that no two students have the same student id -> IC is an example of a key constraint A set of fields that uniquely identifies a tuple according to a key constraint is called a Candidate Key / Key for the relation Student • • {ID} ID Name Department Year 123 Bala CSE II • {Name, Department} 456 Bala ECE II • {ID, Name} 789 Krishnan CSE I Candidate Key Defn - Two distinct tuples in a legal instance (an instance that satisfies all Ics, including the key constraint) cannot have identical values in all the fields of a key - No subset of the set of fields in a key is a unique identifier for a tuple 16 Key Constraints • • • • • • When specifying a key constraint, the DBA or user must be sure that this constraint will not prevent them from storing a 'correct' set of tuple Every relation is guaranteed to have a key Since a relation is a set of tuples, the set of all fields is always a superkey If other constraints hold, some subset of the fields may form a key, but if not, the set of all fields is a key Out of all the available candidate keys, a database designer can identify a primary key DBMS may create an index with the primary key fields as the search key, to make the retrieval of a tuple given its primary key value efficient Students • sid name login age gpa 123 Bala b@nit 23 8.6 456 Bala b@nit 25 8.9 If the constraint is violated, the constraint name is returned and can be used to identify the error {sid}, {name, age}, {name, gpa}, {login, age}, {login, gpa}, {age, gpa} 17 Miscellaneous Index (Created by DBMS Software for Quick Access) Primary Key Location 123 101 456 120 customer customer_id customer_name 123 Bala NITT Trichy 456 Krishnan NITT Trichy 101 customer_street customer_city 102 103 104 1 2 3 \t a l a B 107 106 105 108 18 Foreign Key Constraints • • Information stored in a relation is linked to the information stored in another relation If one of the relations is modified, the other must be checked, and perhaps modified, to keep the data consistent -> foreign key constraint Enrolled Students sid name login age gpa studid cid grade 123 Bala b@nit 23 8.6 123 CSE1 B 456 Bala b@nit 25 8.9 456 ECE1 B • • The studid field of Enrolled is called a foreign key and refers to Students Primary keys must match on both the tables -> Must have the same compatible data type 1. Students Relation • Insert -> No Problem • Delete If not present in Enrolled relation, then allow deletion Else, either don’t allow or delete from both the relations 2. Enrolled Relation • Insert -> Allow if present in Students 19 relation • Delete -> Allow Foreign Key Constraints • • • A foreign key could refer to the same relation Declare Partner column to be a foreign key referring to Students Every student could then have a partner, and the partner field contains the partner's sid Primary Key Foreign Key Students • • sid name login age gpa Partner 123 Bala b@nit 23 8.6 456 456 Bala b@nit 25 8.9 123 789 Krishnan k@nitt.edu 24 8.7 NULL 458 Selva s@nitt.edu 25 7.4 NULL Appearance of null in a foreign key field does not violate the foreign key constraint; NULL -> Unknown or Not Applicable Null values are not allowed to appear in a primary key field (because the primary key fields are used to identify a tuple uniquely) 20 Foreign Key Constraints • • • • Foreign key constraint states that every studid value in Enrolled must also appear in Students - studid in Enrolled is a foreign key referencing Students Every studid value in Enrolled must appear as the value in the primary key field, sid, of Students Incidentally, the primary key constraint for Enrolled states that a student has exactly one grade for each course he or she is enrolled in If we want to record more than one grade per student per course, we should change the primary key constraint 21 General Constraints • • • Domain, primary key, and foreign key constraints are considered to be a fundamental part of the relational data model Require that student ages be within a certain range of values Given such an IC specification, the DBMS rejects inserts and updates that violate the constraint - Very useful in preventing data entry errors Legal Instance • • • Illegal Instance Require that every student whose age is greater than 18 must have a gpa greater than 3 Table constraint -> Associated with a single table and checked whenever that table is modified Assertion Constraint -> Involve several tables and are checked whenever any of these tables is modified 22 Miscellaneous CREATE TABLE sailors (sid int, sname varchar(20), rating int, primary key(sid), CHECK(rating >= 1 AND rating <=10) Table Constraint CHECK((select count(s.sid) from sailors s) + (select count(b.bid) from boats b) < 100) ); Assertion Constraint sailors Boats sid sname rating bid bname 123 bala 8 789 Black Pearl 456 krishnan 10 879 Diamond • Syntax -> CREATE ASSERTION [ assertion_name ] CHECK ( [ condition ] ); https://www.geeksforgeeks.org/difference-between-assertions-and-triggers-in-dbms/ 23 Enforcing Integrity Constraints • • • • • ICs are specified when a relation is created and enforced when a relation is modified Impact of DOMAIN, PRIMARY KEY, and UNIQUE constraints is straightforward - If an insert, delete, or update command causes a violation, it is rejected Every potential Ic violation is generally checked at the end of each SQL statement execution, although it can be deferred until the end of the transaction executing the statement Deletion does not cause a violation of domain, primary key or unique constraints Insertion and Update can cause violations 24 Enforcing Integrity Constraints • Impact of foreign key constraints is more complex - SQL sometimes tries to rectify a foreign key constraint violation instead of simply rejecting the change 1. Students Relation • Insert -> No Problem • Delete If not present in Enrolled relation, then allow deletion Else, either don’t allow or delete from both the relations • Update (sid) -> Allow if the value to be updated is not present in Enrolled relation 2. Enrolled Relation • Insert -> Allow if present in Students relation • Delete -> Allow • Update (sid) -> Allow if the updated value is present in Students relation 25 Enforcing Integrity Constraints • • SQL provides several alternative ways to handle foreign key violations We must consider three basic questions: 26 Enforcing Integrity Constraints • SQL allows us to choose any of the four options on DELETE and UPDATE , cid) • • Cascade -> Whatever you do on one table, repeat the same thing on the other No Action -> Don’t do anything to the table. Just reject the query 27 Enforcing Integrity Constraints DEFAULT ‘53666’, , cid) SET DEFAULT , cid) SET NULL • • Specification of a default value or null is appropriate only in certain situations Correct solution in this example is to also delete all enrollment tuples for the deleted student (that is, CASCADE) or to reject the update 28 Transactions and Constraints • A program that runs against a database is called a transaction • Can contain several statements (queries, inserts, updates, etc.) that access the database Transaction 1 (Account No. 101) 1. 2. 3. 4. 5. • Start Display Balance Deposit Rs. 100/Display Balance Commit Transaction 2 (Account No. 102) 1. 2. 3. 4. Start Withdraw Rs. 1000/Display Balance Commit If (the execution of) a statement in a transaction violates an integrity constraint, should the DBMS detect this right away or should all constraints be checked together just before the transaction completes? • By default, a constraint is checked at the end of every SQL statement that could lead to a violation, and if there is a violation, the statement is rejected - Approach is too inflexible 29 Transactions and Constraints • Every student is required to have an honors course, and every course is required to have a grader, who is some student Students sid name login age honors gpa Courses cid • • • • cname credits grader Whenever a Students tuple is inserted, a check is made to see if the honors course is in the Courses relation, and whenever a Courses tuple is inserted, a check is made to see that the grader is in the Students relation How are we to insert the very first course or student tuple One cannot be inserted without the other Only way to accomplish this insertion is to defer the constraint checking that would normally be carried out at the end of an INSERT statement 30 Transactions and Constraints • SQL allows a constraint to be in DEFERRED or IMMEDIATE mode Insert into Students values (123, Bala, b@nitt.edu, 24, CSMI24, 8.0) Students Insert into Courses values (CSMI24, DBMS, 3, 123) sid name login age honors gpa SET CONSTRAINT ConstraintFoo DEFERRED Courses • cid A constraint in deferred mode is checked at commit time cname credits grader • In our example, the foreign key constraints on Students and Courses can both be declared to be in deferred mode • We can then insert a sid with a nonexistent honors (temporarily making the database inconsistent), insert the corresponding cid (restoring consistency), then commit and check that both constraints are satisfied 31 Querying Relational Data • • • A relational database query is a question about the data, and the answer consists of a new relation containing the result Eg: We might want to find all students younger than 18 or all students enrolled in Reggae203 A query language is a specialized language for writing queries • * -> Retain all fields of selected tuples in the result • S -> Variable that takes on the value of each tuple in Students, one tuple after the other • S.age < 18 -> Specifies that we want to select only tuples in which the age field has a value less than 18 • Domain of a field restricts the operations that are permitted on field values, in addition to restricting the values that can appear in the field 32 Querying Relational Data • A query can extract a subset of the fields of each selected tuple • Order in which we perform these operations does matter - If we remove unwanted fields first, we cannot check the condition S.age < 18, which involves one of those fields If there is a Students tuple S and an Enrolled tuple E such that S.sid = E.studid (so that S describes the student who is enrolled in E) and E.grade = 'A', then print the student's name and the course id • 33 ER to Relational • ER model is convenient for representing an initial, high-level database design • Given an ER diagram describing a database, a standard approach is taken to generate a relational database schema that closely approximates the ER design • How to translate an ER diagram into a collection of tables with associated constraints -> Relational database schema 34 ER to Relational • Entity Sets to Tables - An entity set is mapped to a relation in a straightforward way - Each attribute of the entity set becomes an attribute of the table - Know both the domain of each attribute and the (primary) key of an entity set • Relationship Sets (without Constraints) to Tables - To represent a relationship, we must be able to identify each participating entity and give values to the descriptive attributes of the relationship - Attributes of the relation include: Primary key attributes of each participating entity set, as foreign key fields Descriptive attributes of the relationship set 35 ER to Relational • • Set of nondescriptive attributes is a superkey for the relation If there are no key constraints, this set of attributes is a candidate key • Each department has offices in several locations and we want to record the locations at which each employee works 36 ER to Relational Works_In2 • ssn did address since 123 456 CSE 2009 123 456 CSE 2011 789 456 CSE 2010 address, did and ssn fields are primary keys and cannot take on null values - Constraint ensures that these fields uniquely identify a department, an employee, and a location in each tuple of Works_In - Can also specify that a particular action is desired when a referenced Employees, Departments, or Locations tuple is deleted 37 ER to Relational <=> Supervisor Subordinate • Role indicators supervisor and subordinate are used to create meaningful field names in the CREATE statement for the Reports_To table • Need to explicitly name the referenced field of Employees because the field name (ssn) differs from the name(s) of the referring field(s) (supervisor…ssn, subordinate…ssn) 38 Translating Relationship Sets with Key Constraints • • • A relationship set involves “n” entity sets and some “m” of them are linked via arrows in the ER diagram - Key for anyone of these m entity sets constitutes a key for the relation to which the relationship set is mapped - Have “m” candidate keys, and one of these should be designated as the primary key Manages ssn did since 345 123 2009 345 911 2006 321 123 2009 234 456 2010 567 789 2011 Table corresponding to Manages has the attributes ssn, did, since Each department has at most one manager -> No two tuples can have the same did value but differ on the ssn value - did is itself a key for Manages; indeed, the set did, ssn is not a key 39 Miscellaneous 40 Translating Relationship Sets with Key Constraints Employees ssn name lot 321 Bala Full-Time Departments did dname budget 456 CSE 10,000 Manages Select name,dname from Manages where did = 456 and ssn = 123 • • ssn did since 123 456 (CSE) 1999 789 345 (ECE) 2006 Second approach to translating a relationship set with key constraints is often superior because it avoids creating a distinct table for the relationship set - Idea is to include the information about the relationship set in the table corresponding to the entity set with the key, taking advantage of the key constraint In the Manages example, because a department has at most one manager, we can add the key fields of the Employees tuple denoting the Manager and the since attribute to the Departments tuple 41 Translating Relationship Sets with Key Constraints Departments Select name, dname from DepLMgr where did = 123 and ssn = 123 • • • • did dname budget ssn since 123 CSE 10,000 456 1999 789 ECE 20,000 NULL 2006 Eliminates the need for a separate Manages relation, and queries asking for a department's manager can be answered without combining information from two relations Drawback: Space could be wasted if several departments have no managers -> Added fields would have to be filled with null values First translation (using a separate table for Manages) avoids this inefficiency, but some important queries require us to combine information from two relations, which can be a slow operation Conclusion: If a relationship set involves “n” entity sets and some “m” of them are linked via arrows in the ER diagram, the relation corresponding to anyone of the “m” sets can be augmented to capture the relationship 42 Translating Relationship Sets with Participation Constraints • Every department is required to have a manager, due to the participation constraint, and at most one manager, due to the key constraint 43 Miscellaneous Departments Employees ssn name lot did dname budget 123 Bala Full-Time 789 CSE 10,000 456 Selva Part-Time 321 ECE 20,000 Dept_Mgr Dept_Mgr did dname budget ssn since did dname budget ssn since 789 CSE 10,000 123 2009 789 CSE 10,000 123 2009 321 ECE 20,000 123 2010 321 ECE 20,000 456 44 2010 Translating Relationship Sets with Participation Constraints • Captures the participation constraint that every department must have a manager • ssn cannot take on null values - Each tuple of Dept_Mgr identifies a tuple in Employees (who is the manager) • NO ACTION specification, which is the default and need not be explicitly specified - Ensures that an Employees tuple cannot be deleted while it is pointed to by a Dept_Mgr tuple - If we wish to delete such an Employees tuple, we must first change the Dept_Mgr tuple to have a new employee as manager 45 Translating Weak Entity Sets • • • • • A weak entity set always participates in a one-to-many binary relationship and has a key constraint and total participation Weak entity has only a partial key When an owner entity is deleted, we want all owned weak entities to be deleted A Dependents entity can be identified uniquely only if we take the key of the owning Employees entity and the pname of the Dependents entity Dependents entity must be deleted if the owning Employees entity is deleted 46 Translating Weak Entity Sets NOT NULL • CASCADE option ensures that information about an employee's policy and dependents is deleted if the corresponding Employees tuple is deleted 47 Translating Class Hierarchies • Two basic approaches to handle ISA hierarchies 48 Translating Class Hierarchies • Approach 1 - We can map each of the entity sets Employees, Hourly_Emps, and Contract_Emps to a distinct relation - Employees relation is created as usual - Relation for Hourly_Emps includes the hourly_wages and hours_worked attributes of Hourly_Emps - It also contains the key attributes of the superclass (ssn, in this example), which serve as the primary key for Hourly_Emps, as well as a foreign key referencing the superclass (Employees) - For each Hourly_Emps entity, the value of the name and lot attributes are stored in the corresponding row of the superclass (Employees) - Note that if the superclass tuple is deleted, the delete must be cascaded to Hourly_Emps 49 Translating Class Hierarchies • Approach 1 Employee ssn name lot 123 Bala Hourly_Emps 456 Selva Contract_Emps 789 Karthik NULL Hourly_Emps ssn hourly hours_ _wages worked 123 500 8 Contract_Emps Query: 1. Find the list of all employees (Employee Table) 2. Find the details of all Hourly_Emps (Hourly_Emps + Employee Tables) 3. Find the details of all Contract_Emps (Contract_Emps + Employee Tables) ssn contractid 456 321 50 Translating Class Hierarchies • Approach 2 - - Alternatively, we can create just two relations, corresponding to Hourly_Emps and Contract_Emps Relation for Hourly_Emps includes all the attributes of Hourly_Emps as well as all the attributes of Employees (i.e., ssn, name, lot, hourly_ wages, hours_worked) Relation for Contract_Emps includes all the attributes of Contract_Emps as well as all the attributes of Employees (i.e., ssn, name, lot, contractid) Hourly_Emps ssn 123 name Bala Contract_Emps lot hourly_ wages hours_ worked Hourly_Emps 500 8 ssn name lot contractid 456 Selva Contract_Emps 321 Query: 1. Find the list of all employees (Hourly_Emps + Contract_Emps Tables) 2. Find the details of all Hourly_Emps (Hourly_Emps) 3. Find the details of all Contract_Emps (Contract_Emps) Missing Tuple 789 Karthik NULL 51 Translating Class Hierarchies • • • First approach is general and always applicable - Queries in which we want to examine all employees and do not care about the attributes specific to the subclasses are handled easily using the Employees relation - Queries in which we want to examine, say, hourly employees, may require us to combine Hourly_Emps (or Contract_Emps, as the case may be) with Employees to retrieve name and lot Second approach is not applicable - If we have employees who are neither hourly employees nor contract employees, since there is no way to store such employees - If an employee is both an Hourly_Emps and a Contract_Emps entity, then the name and lot values are stored twice - A query that needs to examine all employees must now examine two relations - On the other hand, a query that needs to examine only hourly employees can now do so by examining just one relation Choice between these approaches clearly depends on the semantics of the data and the frequency of common operations 52 Translating ER Diagrams with Aggregation Departments Employees ssn name lot did dname budget C123 Bala Full-Time 456 CSE D123 Krishnan Part-Time 789 ECE Projects pid Started_on pbudget 10,000 1 1.1.2009 5,000 20,000 2 1.2.2010 3,000 Sponsors did pid since 456 1 2009 789 2 2010 53 Translating ER Diagrams with Aggregation Employees ssn name lot C123 Bala Full-Time D123 Krishnan Part-Time Projects Departments pid Started_on pbudget did dname budget 1 1.1.2009 5,000 456 CSE 10,000 2 1.2.2010 3,000 789 ECE 20,000 54 Translating ER Diagrams with Aggregation CREATE TABLE Monitors(ssn CHAR(10), did CHAR(10), pid CHAR(10), until CHAR(20), PRIMARY KEY(ssn, did, pid), FOREIGN KEY (ssn) REFERENCES EMPLOYEES, FOREIGN KEY (did) REFERENCES Departments, FOREIGN KEY (pid) REFERENCES Projects Monitors • • ssn did pid until C123 456 1 2010 D123 456 1 2010 Monitors Relationship Set -> Create a relation with the following attributes: the key attributes of Employees (ssn), the key attributes of Sponsors (did, pid), and the descriptive attributes of Monitors (until) What about Sponsors relationship set? Should we have it or not? 55 Translating ER Diagrams with Aggregation Monitors ssn did pid until C123 456 1 2010 D123 456 1 2010 Monitors ssn did pid until C123 456 1 2010 D123 456 1 2010 NULL 789 2 2011 Partial Participation Sponsors did pid since 456 1 2009 789 2 56 2010 Translating ER Diagrams with Aggregation Monitors ssn did pid until C123 456 1 2010 D123 456 1 2010 Monitors ssn did pid until C123 456 1 2010 D123 456 1 2010 D123 789 2 2011 Total Participation Sponsors did pid since 456 1 2009 789 2 57 2010 Translating ER Diagrams with Aggregation • Sponsors Relationship Set - Has attributes pid, did, and since - Need it (in addition to Monitors) for two reasons Have to record the descriptive attributes (in our example, since) of the Sponsors relationship Not every sponsorship has a monitor, and thus some (pid, did) pairs in the Sponsors relation may not appear in the Monitors relation - If Sponsors has no descriptive attributes and has total participation in Monitors, every possible instance of the Sponsors relation can be obtained from the (pid, did) columns of Monitors -> Sponsors can be dropped 58 Views • View is a table whose rows are not explicitly stored in the database but are computed as needed from a view definition B-Students (Data Viewed by Students) name • • • sid course View B-Students has three fields called name, sid, and course with the same domains as the fields name and sid in Students and cid in Enrolled If the optional arguments name, sid, and course are omitted from the CREATE VIEW statement, then the column names name, sid, and cid are inherited Whenever B-Students is used in a query, the view definition is first evaluated to obtain the corresponding instance of B-Students, then the rest of the query is evaluated treating B-Students like any other relation referred to in the query select * from B-Students 59 Views, Data Independence and Security • • • • • • Physical schema for a relational database describes how the relations in the conceptual schema are stored, in terms of the file organizations and indexes used Conceptual schema is the collection of schemas of the relations stored in the Database While some relations in the conceptual schema can also be exposed to applications, that is, be part of the external schema of the database, additional relations in the external schema can be defined using the view mechanism View mechanism thus provides the support for logical data independence in the relational model - Can be used to define relations in the external schema that mask changes in the conceptual schema of the database from applications Eg: If the schema of a stored relation is changed, we can define a view with the old schema and applications that expect to see the old schema can now use this view Views are also valuable in the context of security - Can define views that give a group of users access to just the information they are allowed to see 60 Miscellaneous Emp_Id Position DoJ Emp_Id DoJ Salary Application View Level Emp_ID Name Position DoJ Salary Logical Level Hard Disk Physical Level 61 Miscellaneous Emp_Id Position DoJ Emp_Id DoJ Salary Emp _ID Name Application Position DoJ Salary View Level Emp_ID Name Position DoJ Salary Age Logical Level Hard Disk Physical Level 62 Updates on Views • A view can be used just like any other relation in defining a query Students sid name login age gpa 456 Bala b@ni 24 4.5 GoodStudents select * from GoodStudents CREATE VIEW GoodStudents (sid, gpa) AS SELECT S.sid, S.gpa FROM Student S WHERE S.gpa > 3.0 WITH CHECK OPTION CONSTRAINT GPA Insert into GoodStudents values(123, 4) sid gpa 456 4.5 Students sid name login age gpa 456 Bala b@ni 24 4.5 123 NULL NULL NULL 4 • An INSERT or UPDATE may change the underlying base table so that the resulting (i.e., inserted or modified) row is not in the view Insert into GoodStudents values(123, 2.8) -> Default “Allow” 63 Need to Restrict View Updates Students Clubs cname mname • Find the names and logins of students with a gpa greater than 3 who belong to at least one club, along with the club name and the date they joined the club ActiveStudents • Delete the row (Smith, smith@ee, Hiking, 1997) from ActiveStudents. How are we to do this? -> ActiveStudents rows are not stored explicitly but computed as needed from the Students and Clubs tables using the view 64 definition -> Disallow such updates on views Need to Restrict View Updates B-Students (Data Viewed by Students) name • • • • sid course To insert a tuple, say (Dave, 50000, Reggae203) B-Students, we can simply insert a tuple (Reggae203, B, 50000) into Enrolled since there is already a tuple for sid 50000 in Students To insert (John, 55000, Reggae203), we have to insert (Reggae203, B, 55000) into Enrolled and also insert (55000, John, null, null, null) into Students View schema contains the primary key fields of both underlying base tables -> otherwise, we would not be able to support insertions into this view To delete a tuple from the view B-Students, we can simply delete the corresponding tuple from Enrolled 65 Destroying / Altering Tables and Views • • Destroying Table - If we decide that we no longer need a base table and want to destroy it (i.e., delete all the rows and remove the table definition information), we can use the DROP TABLE command - Eg: DROP TABLE Students RESTRICT -> Destroys the Students table unless some view or integrity constraint refers to Students; if so, the command fails - RESTRICT is replaced by CASCADE -> Students is dropped and any referencing views or integrity constraints are (recursively) dropped as well - One of these two keywords must always be specified Destroying View - View can be dropped using the DROP VIEW command, which is just like DROP TABLE 66 Miscellaneous Students DROP TABLE Students RESTRICT Name ID Age Year 123 BT456 18 I 456 MT789 19 II • DELETE FROM Students S where S.ID = BT456 • DELETE FROM Students S where S.ID = MT789 67 Destroying / Altering Tables and Views • Alter Table - ALTER TABLE modifies the structure of an existing table Students - Name ID Age Year Maidenname 123 BT456 18 I NULL 456 MT789 19 II NULL Students is modified to add this column, and all existing rows are padded with null values in this column ALTER TABLE can also be used to delete columns and add or drop integrity constraints on a table Dropping columns is treated very similarly to dropping tables or views 68 Preliminaries • • Inputs and outputs of a query are relations A query is evaluated using instances of each input relation and it produces an instance of the output relation Students sid name login age gpa 456 Bala b@ni 24 4.5 Select name, age from Students where sid = 456 • • name age Bala 24 Used field names to refer to fields Always list the fields of a given relation in the same order and refer to fields by position rather than by field name Select 2, 4 from Students where sid = 456 name age Bala 24 69 Preliminaries 70 Relational Algebra • Selection • Projection • Union • Intersection Set Operations • Difference or Set Difference • Cross-product 71 Selection S • Selection operator σ -> Specifies the tuples to retain through a selection condition • Selection condition is a Boolean combination (i.e., an expression using the logical connectives ˄ and ˅) of terms that have the form attribute op constant or attribute1 op attribute2 • op is one of the comparison operators <, <=, =, ≠, >=, or > Reference to an attribute can be by position (of the form .i or i) or by name (of the form .name or name) 72 Projection S • • Subscript sname, rating specifies the fields to be retained Other fields are 'projected out’ S age {35.0, 35.0, 35.0} 55.5 • • • Although three sailors are aged 35, a single tuple with age=35.0 appears in the result of the projection In practice, real systems often omit the expensive step of eliminating duplicate tuples, leading to relations that are multisets Our discussion of relational algebra assumes that duplicate elimination is always done so that relations are always sets of tuples 73 Projection • • Result of a relational algebra expression is always a relation, we can substitute an expression wherever a relation is expected Eg: We can compute the names and ratings of highly rated sailors by combining two of the preceding queries S 74 Set Operations • • • Union (A U B) A B A B A B Intersection (A ∩ B) Difference (A – B) 75 Set Operations • Union (A U B) Student ID Name Course Taken 12 Bala CSMI23 23 Karthik • • • • CSHO23 Union Employee ID 45 12 Name Selva Sai Course Handling CSMI23 CSOE17 ID Name Course Taken 12 Bala CSMI23 12 Sai CSOE17 23 Karthik CSHO23 45 Selva CSMI23 A U B returns a relation instance containing a U tuples that occur in either relation instance A or relation instance B (or both) A and B must be union-compatible, and the schema of the result is defined to be identical to the schema of A Two relation instances are said to be union-compatible, if the following conditions hold: - Both have the same number of the fields - Corresponding fields, taken in order from left to right, have the same Domains Note that field names are not used in defining union-compatibility - For convenience, we will assume that the fields of A U B inherit names from A, if the fields of A have names 76 Set Operations • Intersection (A ∩ B) Student (2011 Batch) ID Name Course Taken 12 Bala CSMI23 23 Karthik CSHO23 • Student (2011 & 2012 Batch) ID Intersection Name Course Taken 12 Bala CSMI23 12 Bala CSMI23 23 Karthik CSHO23 23 Karthik CSHO23 45 Selva CSHo23 ID Name Course Taken A ∩ B returns a relation instance containing all tuples that occur in both A and B • Relations A and B must be union-compatible • Schema of the result is defined to be identical to the schema of A 77 Set Operations • Difference or Set Difference (A - B) Student (2011 Batch) ID Name Course Taken 12 Bala CSMI23 23 Karthik CSHO23 • Student (2011 & 2012 Batch) Difference or Set Difference ID ID Name Course Taken 12 Bala CSMI23 45 Selva CSHo23 23 Name Karthik Course Taken CSHO23 A - B returns a relation instance containing all tuples that occur in A but not in B • Relations A and B must be union-compatible • Schema of the result is defined to be identical to the schema of A 78 Set Operations • Cross Product (A x B) Student (2011 Batch) Student (2011 & 2012 Batch) ID Name Course Taken 12 Bala CSMI23 23 Karthik CSHO23 ID Name CGPA 23 Karthik 7 13 Kumar 8.9 2x2=4 Cross Product 1 2 Course Taken 4 5 CGPA 12 Bala CSMI123 23 Karthik 7 12 Bala CSMI23 13 Kumar 8.9 23 Karthik CSHO23 23 Karthik 7 23 Karthik CSHO23 13 Kumar 8.9 79 Set Operations • Cross Product (A x B) 3x2=6 1 5 80 Set Operations • Cross Product (A x B) - A x B returns a relation instance whose schema contains all the fields of A followed by all the fields of B - Result of A x B contains all tuple (the concatenation of tuples A and B) - Cross-product operation is sometimes called Cartesian product - Fields of A x B inherit names from the corresponding fields of A and B - It is possible for both A and B to contain one or more fields having the same name Creates a naming conflict Corresponding fields in A x B are unnamed and are referred to solely by position 81 Renaming • • Name conflicts can arise in some cases -> A x B Convenient to be able to give names explicitly to the fields of a relation instance that is defined by a relational algebra expression Renaming operator ρ Expression ρ(C(1 → StudID, 2 → Name, 4 → ID, 5 → Name1), A x B) returns a relation that contains the tuples with the following schema: - C(StudID: Integer, Name: string, Course Taken: String, ID: Integer, Name1: Integer, CGPA: Real) • • Cross Product 1 2 Course Taken Cross Product (with Renaming) 4 5 CGPA 12 Bala CSMI123 23 Karthik 7 12 Bala CSMI23 13 Kumar 8.9 23 Karthik CSHO23 23 Karthik 7 23 Karthik CSHO23 13 Kumar 8.9 StudID Name Course Taken ID Name1 CGPA 12 Bala CSMI123 23 Karthik 7 12 Bala CSMI23 13 Kumar 8.9 23 Karthik CSHO23 23 Karthik 7 23 Karthik CSHO23 13 Kumar 8.9 82 Joins • Used to combine information from two or more relations • Although a join can be defined as a cross-product followed by selections and projections, joins arise much more frequently in practice than plain cross-products • Result of a cross-product is typically much larger than the result of a join • Very important to recognize joins and implement them without materializing the underlying cross-product 83 Joins • Condition Join • Equi Join • Left Join • Right Join • Natural Join 84 Condition Join A B A B A A.sid < B.sid B A 1 5 1 A.sid < B.sid B 5 85 Condition Join • • Join condition is identical to a selection condition in form is defined to be a cross-product followed by a selection • Condition c can refer to attributes of both A and B • Reference to an attribute of a relation, say, A, can be by position (of the form A.i) or by name (of the form A.name) 86 Equi Join • • • • • • A Join condition A B solely consists of equalities -> A.name1 = B.name2 Some redundancy in retaining both attributes in the result For join conditions that contain only such equalities, the join operation is refined by doing an additional projection in which B.name2 is dropped Join operation with this refinement -> Equijoin Schema of the result of an equijoin contains the fields of A followed by the fields of B that do not appear in the join conditions If this set of fields in the result relation includes two fields that inherit the same name from A and B, they are unnamed in the result relation A.sid = B.sid 1 B 5 87 Equi Join • A A.ID = B.ID B Student (2011 Batch) Student (2011 & 2012 Batch) ID Name Course Taken 12 Bala CSMI23 23 Karthik CSHO23 ID Name CGPA 23 Karthik 7 13 Kumar 8.9 Equi Join Cross Join 1 2 Course Taken 4 5 CGPA 12 Bala CSMI23 23 Karthik 7 12 Bala CSMI23 13 Kumar 8.9 23 Karthik CSHO23 23 Karthik 7 23 Karthik CSHO23 13 Kumar 8.9 ID 23 2 Karthik Course Taken CSHO23 4 CGPA Karthik 7 88 Natural Join • • • Special case of join operation A B Equalities are specified on all fields having the same name in A and B -> Omit the join condition - Default is that the join condition is a collection of equalities on all common fields Has the nice property that the result is guaranteed not to have two fields with the same name A Student (2011 Batch) ID Name Course Taken 12 Bala CSMI23 23 Karthik CSHO23 • Natural Join (No Condition) Student (2011 & 2012 Batch) ID Name CGPA 23 Karthik 7 13 Kumar 8.9 B ID Name 23 Karthik If the two relations have no attributes in common, A cross-product Course Taken CGPA CSHO23 7 B is simply the 89 Natural Join • A B Student (2011 Batch) Student (2011 & 2012 Batch) ID Name Course Taken 12 Bala CSMI23 23 Karthik CSHO23 ID Name CGPA 23 Karthik 7 13 Kumar 8.9 A Cross Join 1 2 Course Taken B Natural Join (No Condition) 4 5 CGPA ID Name 23 Karthik 12 Bala CSMI23 23 Karthik 7 12 Bala CSMI23 13 Kumar 8.9 23 Karthik CSHO23 23 Karthik 7 23 Karthik CSHO23 13 Kumar 8.9 Course Taken CSHO23 CGPA 7 90 Natural Join • A B Student (2011 Batch) Student (2011 & 2012 Batch) ID Name Course Taken StudID Sname CGPA 12 Bala CSMI23 23 Karthik 7 23 Karthik CSHO23 13 Kumar 8.9 A B Cross Join ID Name Course Taken StudID Sname CGPA 12 Bala CSMI23 23 Karthik 7 12 Bala CSMI23 13 Kumar 8.9 23 Karthik CSHO23 23 Karthik 7 23 Karthik CSHO23 13 Kumar 8.9 91 Left Outer Join Emp_ID 112 114 116 118 A Name Bala Krishnan Kumaran Sai B Age 25 45 23 21 Emp_ID 112 116 114 120 Position Asst. Prof. Prof. Asso. Prof. Prof. Left Outer Join Emp_ID Name Age Position 112 Bala 25 Asst. Prof 114 Krishnan 45 Asso. Prof. 116 Kumaran 23 Prof. 118 Sai 21 NULL 92 Right Outer Join Emp_ID 112 114 116 118 A Name Bala Krishnan Kumaran Sai B Age 25 45 23 21 Emp_ID 112 116 114 120 Position Asst. Prof. Prof. Asso. Prof. Prof. Right Outer Join Emp_ID Position Name Age 112 Asst. Prof. Bala 25 116 Prof. Kumaran 23 114 Asso. Prof. Krishnan 45 120 Prof. NULL NULL 93 Full Join Emp_ID 112 114 116 118 A Name Bala Krishnan Kumaran Sai B Age 25 45 23 21 Emp_ID 112 116 114 120 Position Asst. Prof. Prof. Asso. Prof. Prof. Full Join Emp_ID Position Name Age 112 Bala 25 Asst. Prof 114 Krishnan 45 Asso. Prof. 116 Kumaran 23 Prof. 94 Preliminaries • • • • • • • Data-Definition Language (DDL) -> SQL DDL provides commands for defining relation schemas, deleting relations, and modifying relation schemas Interactive Data-Manipulation Language (DML) -> SQL DML includes a query language based on both the relational algebra and the tuple relational calculus. It also includes commands to insert tuples into, delete tuples from, and modify tuples in the database Integrity -> SQL DDL includes commands for specifying integrity constraints that the data stored in the database must satisfy. Updates that violate integrity constraints are disallowed View Definition -> SQL DDL includes commands for defining views. Transaction Control -> SQL includes commands for specifying the beginning and ending of transactions Embedded SQL and Dynamic SQL -> Embedded and dynamic SQL define how SQL statements can be embedded within general-purpose programming languages such as C, C++, Java, PL/L Cobol, Pascal and Fortran Authorization -> SQL DDL includes commands for specifying access rights to relations and views 95 Preliminaries 96 Data Definition Language • Set of relations in a database must be specified to the system by means of a data definition language (DDL) • SQL DDL allows specification of not only a set of relations, but also information about each relation - Schema for each relation - Domain of values associated with each attribute - Integrity constraints - Set of indices to be maintained for each relation - Security and authorization information for each relation - Physical storage structure of each relation on disk 97 Basic Domain Types • • • • • • • • char(n) or character(n) -> A fixed-length character string with userspecified length n Eg: name char(5); name = bala; □bala varchar(n) or character varying(n) -> A variable-length character string with user-specified maximum length n Eg: name varchar(5); name = bala; bala int or integer -> An integer (a finite subset of the integers that is machine dependent) Smallint -> A small integer (a machine-dependent subset of the integer domain type) numeric(p, d) -> A fixed-point number with user-specified precision. The number consists of p digits (plus a sign), and d of the p digits are to the right of the decimal point. Thus, numeric(3,1) allows 44.5 to be stored exactly, but neither 444.5 or 0.32 can be stored exactly in a field of this type real, double precision: Floating-point and double-precision floating-point numbers with machine-dependent precision float(n): A floating-point number, with precision of at least n digits SQL also provides special data types, such as various forms of the date 98 type -> DD-MM-YEAR; MM-DD-YEAR, etc. Basic Schema Definition in SQL • delete from account where id = 123 • drop table account CASCADE Many database systems do not support dropping of attributes, although they will allow an entire table to be dropped 99 Basic Schema Definition in SQL • delete from account where id = 123 • drop table account CASCADE Drop command deletes not only all tuples of the relation, but also the schema for relation. After relation is dropped, no tuples can be inserted into the relation unless it is re-created with the create table command 100 Basic Structure of SQL Queries • • • • A relational database consists of a collection of relations, each of which is assigned a unique name NULL -> Indicate that the value either is unknown or does not exist NOT NULL -> Used to specify which attributes cannot be assigned null values Basic structure of an SQL expression consists of three clauses - select clause corresponds to the projection operation of the relational algebra. It is used to list the attributes desired in the result of a query - from clause corresponds to the Cartesian-product operation of the relational algebra. It lists the relations to be scanned in the evaluation of the expression - where clause corresponds to the selection predicate of the relational algebra. It consists of a predicate involving attributes of the relations that appear in the from clause 101 Basic Structure of SQL Queries • If the where clause is omitted, the predicate P is true • Unlike the result of a relational-algebra expression, the result of the SQL query may contain multiple copies of some tuples • Three Steps - SQL forms the Cartesian product of the relations named in the from clause - Performs a relational-algebra selection using the where clause predicate - Projects the result onto the attributes of the select clause 102 loan • Select Clause ID Branch_name Branch_name 12 Trichy Trichy 23 Trichy Trichy 45 Trichy Trichy 67 Chennai Chennai Query will list each branch_name once for every tuple in which it appears in the loan relation -> Duplicates are not removed (No. of occurrences of duplicates may differ) Branch_name Trichy Chennai • Force the elimination of duplicates, we insert the keyword distinct after Branch_name select Trichy Trichy • • • Trichy Use keyword all to specify explicitly that duplicates are not removed Asterisk symbol " * " can be used to denote "all attributes“ select clause may also contain arithmetic expressions involving the operators *, -, *, and / operating on constants or attributes of tuples Chennai 103 Where Clause <> means not equal to • SQL uses the logical connectives and, or, and not -> Rather than the mathematical symbols ˄, ˅ and ˥ in the where clause • Operands of the logical connectives can be expressions involving the comparison operators <, <=, >, >=, = and <> • SQL allows us to use the comparison operators to compare strings and arithmetic expressions, as well as special types, such as date types • Similarly, not between comparison operator also exist 104 From Clause • • • • from clause by itself defines a Cartesian product of the relations in the clause Since the natural join is defined in terms of a Cartesian product, a selection, and a projection, it is a relatively simple matter to write an SQL expression for the natural join relation-name.attribute-names, does the relational algebra, to avoid ambiguity in cases where an attribute appears in the schema of more than one relation SQL includes extensions to perform natural joins and outer joins in the from clause 105 Rename Operation • Rename both relations and attributes old-name as new-name - As clause can appear in both the select and from clauses • Names of the attributes in the result are derived from the names of the attributes in the relations in the from clause Cannot always derive names in this way, for several reasons - First, two relations in the from clause may have attributes with the same name, in which case an attribute name is duplicated in the result - Second, if we used an arithmetic expression in the select clause, the resultant attribute does not have a name - Third, even if an attribute name can be derived from the base relations as in the preceding example, we may want to change the attribute name in the result 106 • Tuple Variables • Tuple variables are defined in the from clause by way of the as clause • Tuple variables are most useful for comparing two tuples in the same relation Observe that we could not use the notation branch.asset, since it would not be clear which reference to branch is intended SQL permits us to use the notation (v1, v2, v3, …, vn) to denote a tuple of arity (or degree) n containing values v1, v2, v3, …, vn Comparison operators can be used on tuples, and the ordering is defined lexicographically - (a1, a2) <= (b1, b2) is true, if a1 < b1, or (a1 = b1) ˄ (a2 <= b2) Two tuples are equal if all their attributes are equal • • • • 107 String Operations • • • SQL specifies strings by enclosing them in single quotes -> 'Penytidge‘ A single quote character that is part of a string can be specified by using two single quote characters -> It’’s right Most used operation on strings is pattern matching using the operator like Describe patterns by using two special characters - Percent (%) -> % character matches any substring - Underscore (_) -> _ character matches any character Patterns are case sensitive • SQL expresses patterns by using the like comparison operator • • 108 String Operations • For patterns to include the special pattern characters (that is, % and _), SQL allows the specification of an escape character • Escape character is used immediately before a special pattern character to indicate that the special pattern character is to be treated like a normal character • SQL allows us to search for mismatches instead of matches by using the not like comparison operator • SQL also permits a variety of functions on character strings • Concatenating (||), extracting substrings, finding the length of strings, converting strings to uppercase (upper()) and lowercase (lower()), etc. 109 Ordering the Display of Tuples • • • • • • • • Offers the user some control over the order in which tuples in a relation are displayed order by clause causes the tuples in the result of a query to appear in sorted order By default, the order by clause lists items in ascending order ordering can be performed on multiple attributes Suppose that we wish to list the entire loan relation in descending order of amount If several loans have the same amount, we order them in ascending order by loan number To fulfill an order by request, SQL must perform a sort Sorting a large number of tuples may be costly -> Do it only when necessary 110 Duplicates • • • SQL formally defines not only what tuples are in the result of a query, but also how many copies of each of those tuples appear in the result Given multiset relations r1 and r2 For example, suppose that relations r1 with schema (A, B) and r2 with schema (c) are the following multisets: Check • Then ∏B(r1) would be {(a), (a)}, whereas ∏B(r1) x r2 would be • We can now define how many copies of each tuple occur in the result of an SQL query 111 Set Operations • • • SQL operations union, intersect, and except operate on relations and correspond to the relational-algebra operations U, ∩, and – depositor borrower Must be Union Compatible Union Operation - customer_name customer_name Bala Bala Bala Bala Bala Selva Sai Union operation automatically eliminates duplicates If we want to retain all duplicates, we must write union all Number of duplicate tuples in the result is equal to the total number of duplicates that appear in both depositor and borrower 112 Set Operations • Intersection Operation depositor borrower customer_name customer_name Bala Bala Bala Bala Bala Selva Sai - • • • • Intersect operation automatically eliminates duplicates If we want to retain all duplicates, we must write intersect all Number of duplicate tuples that appear in the result is equal to the minimum number of duplicates in both depositor and borrower depositor borrower Except Operation customer_name customer_name Bala Bala Bala Bala Sai Selva Sai Except operation automatically eliminates duplicates If we want to retain all duplicates, we must write except all Number of duplicate copies of a tuple in the result is equal to the number of duplicate copies of the tuple in depositor minus the number of duplicate copies of the tuple in borrower, provided that the difference is positive 113 Aggregate Functions • • • Functions that take a collection (a set or multiset) of values as input and return a single value SQL offers five built-in aggregate functions - Average: avg - Minimum: min - Maximum: max - Total: sum - Count: count Input to sum and avg must be a collection of numbers, but the other operators can operate on collections of nonnumeric data types, such as strings 114 Aggregate Functions • • • • • • • • SQL does not allow the use of distinct with count (*) It is legal to use distinct with max and min even though the result does not change We can use the keyword all in place of distinct to specify duplicate retention, but, since all is the default, there is no need to do so If a where clause and a having clause appear in the same query, SQL applies the predicate in the where clause first Tuples satisfying the where predicate are then placed into groups by the group by clause SQL then applies the having clause, if it is present, to each group Removes the groups that do not satisfy the having clause predicate Select clause uses the remaining groups to generate tuples of the result of the query 115 NULL Values • • • • SQL allows the use of null values to indicate absence of information about the value of an attribute Can use the special keyword null in a predicate to test for a null value Predicate is not null tests f or the absence of a null value Use of a null value in arithmetic and comparison operations causes several complications - Result of an arithmetic expression (involving, for example +, -, * or /) is null, if any of the input values is null - and -> Result of true and unknown is unknown, false and unknown is false, while unknown and unknown is unknown - or -> Result of true or unknown is true, false or unknown is unknown, while unknown or unknown is unknown - Not -> Result of not unknown is unknown 116 NULL Values amount NULL NULL NULL • count(*) = {NULL, NULL, NULL} = 3 sum(amount) = { } = NULL count(sum(amount)) = count({ }) = 0 All aggregate functions except count (*) ignore null values in their input collection • As a result of null values being ignored, the collection of values may be empty - Count of an empty collection is defined to be 0 - All other aggregate operations return a value of null when applied on an empty collection • Effect of null values on some of the more complicated SQL constructs can be subtle 117 Views • • View -> Any relation that is not part of the logical model, but is made visible to a user as a virtual relation Possible to support a large number of views on top of any given set of actual relations create view v as <query expression> 118 Views • • • • • If a view relation is computed and stored, it may become out of date if the relations used to define it are modified To avoid this, views are usually implemented as follows - When we define a view, the database system stores the definition of the view itself - Wherever a view relation appears in a query, it is replaced by the stored query expression - Whenever we evaluate the query, the view relation gets recomputed Certain database systems allow view relations to be stored, but they make sure that, if the actual relations used in the view definition change, the view is kept up to date -> Materialized Views Process of keeping the views up to date -> View Maintenance Applications that use a view frequently benefit from the use of materialized views, as do applications that demand fast response to certain view-based queries 119 Modification of the Database • • A delete command operates on only one relation If we want to delete tuples from several relations, we must use one delete command for each relation • Delete statement first tests each tuple in the relation account to check whether the account has a balance less than the average at the bank Then, all tuples that fail the test are deleted Performing all the tests before performing any deletion is important - If some tuples are deleted before other tuples have been tested, the average balance may change, and the final result of the delete would depend on the order in which the tuples were processed • • 120 Modification of the Database • Evaluate the select statement fully before we carry out any insertions • If we carry out some insertions even as the select statement is being evaluated, a request such as the above might insert infinite number of tuples 121 Modification of the Database • Evaluate the select statement fully before we carry out any insertions account • no name balance 123 Bala 10,000 234 Krishnan 15,000 456 Sai 18,000 If we carry out some insertions even as the select statement is being evaluated, a request such as the above might insert infinite number of tuples 122 Modification of the Database • Change a value in a tuple without changing aII values in the tuple • SQL provides a case construct, which we can use to perform both the updates with a single update statement, avoiding the problem with order of updates • Operation returns resulti, where i is the first of pred1, pred2, . . . , predn that is satisfied If none of the predicates is satisfied, the operation returns result 0 Case statements can be used in any place where a value is expected • • 123 Modification of the Database • Views are a useful tool for queries, they present serious problems if we express updates, insertions, or deletions with them • Difficulty is that a modification to the database expressed in terms of a view must be translated to a modification to the actual relations in the logical model of the database • Insertion must be represented by an insertion into the relation loan, since loan is the actual relation from which the database system constructs the view loan-branch • To insert a tuple into loan, we must have some value for amount - Reject the insertion, and return an error message to the user - Insert a tuple (L-37, "Perryridge", null) into the loan relation 124 Modification of the Database • Only possible method of inserting tuples into the borrowerand.loan relations is to insert ("Johnson", null) into borrower and (null, null, 1900) into loan • Update does not have the desired effect, since the view relation loan_info still does not include the tuple (Johnson, 1900) No way to update the relations borrower and loan by using nulls to get the 125 desired update on loan_info • Modification of the Database • • • Because of problems such as these, modifications are generally not permitted on view relations, except in limited case Different database systems specify different conditions under which they permit updates on view relations -> Manual An SQL view is said to be updatable (that is, inserts, updates or deletes can be applied on the view) if the following conditions are all satisfied: - From clause has only one database relation - Select clause contains only attribute names of the relation, and does not have any expressions, aggregates, or distinct specification - Any attribute not listed in the select clause can be set to null - Query does not have a group by or having clause • insert into downtown_account values (123, Trichy, 10,000) -> Do not allow • insert into downtown_account values (123, Downtown, 10,000) -> Allow • Problem still remains 126 Modification of the Database • By default, SQL would allow the above update to proceed • Views can be defined with a with check option clause af the end of the view definition - If a tuple inserted into the view does not satisfy the view's where clause condition, the insertion is rejected by the database system - Updates are similarly rejected if the new value does not satisfy the where clause conditions 127 Transactions • Transaction consists of a sequence of query and/or update statements • SQL standard specifies that a transaction begins implicitly when an SQL • statement is executed One of the following SQL statements must end the transaction • - Commit work commits the current transaction - Rollback work causes the current transaction to be rolled back Commit is similar, in a sense, to saving changes to a document that is being edited, while rollback is similar to quitting the edit session without saving changed • Case of power outage or other system crash -> Rollback occurs when the • system restarts Allow multiple SQL statements to be enclosed between the keywords begin atomic ... End • All the statements between the keywords then form a single transaction 128 Joins 129 Join Types and Conditions • • • • SQL Join operations take two relations and return another relation as the result Outerjoin expressions are typically used in the from clause, they can be used anywhere that a relation can be used Each of the variants of the join operations in SQL consists of a join type and a join condition - Join condition defines which tuples in the two relations match and what attributes are present in the result of the join - Join type defines how tuples in each relation that do not match any tuple in the other relation (based on the join condition) are treated Use of a join condition is mandatory for outer joins (if it is omitted, a Cartesian product results), but is optional for inner joins 130 Join Types and Conditions • Keyword natural appears before the join type whereas the on and using conditions appear at the end of the join expression • Keywords inner and outer are optional, since the rest of the join type enables us to deduce whether the join is an inner join or an outer join • Meaning of the join condition natural, in terms of which tuples from the two relations match, is straightforward • Ordering of the attributes in the result of a natural join - Join attributes (that is, the attributes common to both relations) appear first, in the order in which they appear in the left-hand-side relation - Next all nonjoin attributes of the left-hand-side relation, and finally all nonjoin attributes of the right-hand-side relation • Right outer join is symmetric to the left outer join - Tuples from the right-handside relation that do not match any tuple in the left-hand-side relation are padded with nulls and are added to the result of the right outer join 131 Join Types and Conditions • Join condition using (A1, A2, . . , An) is similar to the natural-join condition - Except that the join attributes are the attributes, A1, A2, . . , An, rather than all attributes that are common to both relations - Attributes, A1, A2, . . , An, must consist of only attributes that are common to both relations, and they appear only once in the result of the join 132 Join Types and Conditions • Full outer join is a combination of the left and right outer-join types • Query -> Find all customers who have an account but no loan at the bank 133 Join Types and Conditions • Query -> Find all customers who have either an account or a loan (but not both) at the bank • SQL-92 also provides two other join types -> Cross Join and Union Join • Cross Join is equivalent to an inner join without a join condition • Union Join is equivalent to a full outer join on the "false" condition -> That is, where the inner join is empty 134 Join Types and Conditions Employee Projects EmpID Name EmpID 1 Ferguson X-63 Structure 1 2 Frost X-64 Structure 1 3 Toyon X-63 Guidance 2 X-64 Guidance 2 X-63 Telemetry X-64 Telemetry ProjectName E.EmpID P.EmpID ProjectName S.EmpID Skill 1 Ferguson NULL NULL NULL NULL 3 NULL NULL 1 X-63 Structure NULL NULL 3 NULL NULL 1 X-64 Structure NULL NULL NULL NULL NULL NULL 1 Mechanical Design NULL NULL NULL NULL 1 Aerodynamic Loading 2 Frost NULL NULL NULL NULL NULL NULL 2 X-63 Guidance NULL NULL NULL NULL 2 X-64 Guidance NULL NULL NULL NULL NULL NULL 2 Analog Design Skills Skill Name EmpID Mechanical Design 1 Aerodynamic Loading 1 Analog Design 2 NULL NULL NULL NULL 2 Gyroscope Design Gyroscope Design 2 3 Toyon NULL NULL NULL NULL Digital Design 3 NULL NULL 3 X-63 Telemetry NULL NULL R/F Design 3 NULL NULL NULL NULL 3 Digital Design NULL NULL NULL NULL 3 R/F Design 135 Natural Join • A B Student (2011 Batch) Student (2011 & 2012 Batch) ID Name Course Taken 12 Bala CSMI23 23 Karthik CSHO23 ID Name CGPA 23 Karthik 7 13 Kumar 8.9 A Cross Join 1 2 Course Taken B Natural Join (No Condition) 4 5 CGPA ID Name 23 Karthik 12 Bala CSMI23 23 Karthik 7 12 Bala CSMI23 13 Kumar 8.9 23 Karthik CSHO23 23 Karthik 7 23 Karthik CSHO23 13 Kumar 8.9 Course Taken CSHO23 CGPA 7 136 Natural Join • A B Student (2011 Batch) ID Name Student (2011 & 2012 Batch) Course Taken ID Name CGPA 12 Bala CSMI23 23 Selva 7 23 Karthik CSHO23 13 Kumar 8.9 A Cross Join 1 2 Course Taken B Natural Join (No Condition) 4 5 CGPA 12 Bala CSMI23 23 Selva 7 12 Bala CSMI23 13 Kumar 8.9 23 Karthik CSHO23 23 Selva 7 23 Karthik CSHO23 13 Kumar 8.9 ID Name Course Taken CGPA 137 Nested Subqueries • • • • Subquery is a select-from-where expression that is nested within another query Common use of subqueries is to perform tests for set membership, make set comparisons, and determine cardinality Set Membership - SQL allows testing tuples for membership in a relation The in connective tests for set membership, where the set is a collection of values produced by a select clause The not in connective tests for the absence of set membership in and not in operators can also be used on enumerated sets 138 Nested Subqueries depositor customer_name ID Bala 123 Sai 456 Jones 14 customer_name Bala Sai Jones customer_name Jones 139 Nested Subqueries • Test for Empty Relations - SQL includes a feature for testing whether a subquery has any tuples in its result - exists construct returns the value true if the argument subquery is nonempty - Find all customers who have both an account and a loan at the bank • Find all customers who have an account at all the branches located in Brooklyn 140 Nested Subqueries • Test for the Absence of Duplicate Tuples - SQL includes a feature for testing whether a subquery has any duplicate tuples in its result - unique construct returns the value true if the argument subquery contains no duplicate, tuples - Find all customers who have at most one account at the perryridge branch • Find all customers who have at least two accounts at the perryridge branch 141 Nested Subqueries • Test for the Absence of Duplicate Tuples - unique test on a relation is defined to fail if and only if the relation contains two tuples t1 and t2 such that t1 = t2 Student - Name ID Dept bala 123 CSE bala 123 CSE karthik 456 ECE unique will be False Since the test t1 = t2 fails if any of the fields of t1 or t2 are null, it is possible for unique to be true even if there are multiple copies of a tuple, as long as at least one of the attributes of the tuple is null Student Name ID Dept bala 123 NULL bala 123 CSE karthik 456 ECE unique will be True 142 Nested Subqueries • Set Comparison - Ability of a nested subquery to compare sets - Find the names of all branches that have assets greater than those of at least one branch located in Brooklyn <> means not equal to - - > some comparison in the where clause of the outer select is true if the assets value of the tuple is greater than at least one member of the set of all asset values for branches in Brooklyn SQL allows < some, <= some, >= some, = some and <> some comparisons some is identical to in, whereas <> some is not the same as not in 143 Keyword any is synonymous to some in SQL Nested Subqueries • Set Comparison - Find the names of all branches that have an asset value greater than that of each branch in Brooklyn <> means not equal to - SQL also allows < all, <= all, >= all, = all, <> all comparisons <> all is identical to not in Find the branch that has the highest average balance 144 THANK YOU 145