COM 242 DATABASE MANAGEMENT SYSTEM (DBMS) -1- DATABASE MANAGEMENT SYSTEM (DBMS): DBMS consists of a collection of interrelated data and a set of programs to access them. The collection of data usually retreat to as the database contains information about a particular enter price. The primary goal of a DBMS is to provide an environment that is both convenient and efficient to use in storing and retrieving database information. Database systems are designed to manage large bodies of information. The management of data involves both the definition of structure for the storage of information and the provision of mechanisms for the manipulation of information. In addition, the database system must provide for the safety of the information stored despite system crashes or attempts at unauthorized access. If data are to be shared among several users, the system must provide for concurrent access. Data Abstraction and Viewing A major purpose of a database system is to provide the user with an abstract view of the data. The system hides the complexity and the detail of how the data are stored and retrieved in the DBMS. There are three levels of data abstraction. 1- Physical Level The lowest level of abstraction describes how the data are actually stored on physical devices like disks & tapes. Complex set of low-level data structures are defined in accordance with the operating system in use. 2- Logical Level Logical level abstraction describes what data are stored in the database, and what relationship exists among them. The logical structure for the database is defined by the database administrator, who must decide what information is to be kept in the DB. -2- 3- View Level The highest level of abstraction describes only part of the database which the user will apply a certain application program price to interact with the selected portion of the database. The user does not need to know about the complexities of the structure and how the data are stored and retrieved in the database. View level View 1 _______ View 2 View n Logical Level Physical Level The three levels of data abstraction Example: The following example may be used to clarity the distraction among levels of abstraction. A high-level programming language may declare a customer record as following. type customer = record Customer-Name : string ; Customer-Name : string ; Customer-Name : string ; end ; The code defines a new record called customer with three fields. Each field has a name and a type associated with it. -3- At physical level a customer record can be described as a block of consecutive storage location (for example, words on bytes). The compiler hides this level of detail from the programmer. At the logical level the programmer will use the record name, field name and the type to design procedures for the user. At the review level users will employ the programs to store and retrieve the data to and from database. The detail on logical structure used by the programmers and the methods of store-retrieve operations are again hidden. Data Models A collection of conceptual tools for describing data, data relations, data schematics and consistency constraints underlying the structure of a database is called data model. Various data models proposed full into three different groups. 1. Object-Based logical models 2. Record-Based logical models 3. Physical models 1- Object-Based Logical Models Object-Based logical models are used to describe data at logical and view levels. They are characterized by the fact that they provide fairly flexible structuring capabilities and allow data constrains to be specified explicitly. There are many different models. The most widely known ones are: The Entity-Relationship model The Object-Oriented model The Schematics data model The functional data model Let’s have a closer look at the first two -4- The Entity-Relationship Model The E-R model is based on a perception of a real world that consists of a collection of basic objects called ENTITIES, and of relationships among those objects. An entity is a “Thing” or “Object” in the real world that is distinguishable from other objects. For example each person is an entity; bank accounts can be considered to be entities. Entities are described in a database by a set of attributes. For example, the attributes Account-Number and Account-Balance describe one particular account in a bank. A relationship association among several entities, For example, a Depositor relationship associates a customer with each account that he/she has the set of all entities of the same type, and the set of all relationships of the same type are termed as entity set and relationship set respectively. The over all logical structure of a database can be expressed graphically by an E-R diagram which is built-up from the following components. Rectangles : which represent entity set Ellipses : which represent attributes Diamond : which represent relationships among entity sets Lines : which link attributes to entity sets and entity sets to relationships Social -Security Customer -Name Customer -Street Account -Number Balance Customer -City Customer Depositor Account Fig: A sample E-R Diagram Each component is labeled with the entity or relationship that is represents. -5- The Object-Oriented Model Like the E-R model, the object-oriented model is based on a collection of objects. An object contains values stored in instance variables within the object. An object also contains bodies of code that operate on the object. These bodies of codes are called methods. Objects that contain the same type of values and the same method are grouped together into classes. A class may be viewed as a type definition for objects. This combination of data and methods comprising a type definition is similar to a programming-language abstract data type. Example: Consider an object representing a bank account. Such an object contains instance variables account-number and balance. It contains a method pay-interest, which adds interest to the balance. Assume that the bank pays 6% interest to all accounts, but now is changing its policy to pay 5% to those balances less than $1000. Under most data models, making this adjustment would involve changing code in one or more application program. Under the object-oriented model, the only change is made within the pay-interest method. The external interface to the objects remain same 2- Record-Based Logical Models Record-Based logical models are used in describing data at the logical and view level. In contrast to object-based data models, they are used both to specify the overall logical structure of the database and to provide a higherlevel description of the implementation. Record-Based models are so named because the database is structured in fixed-format records to several types. Each record type defines a fixed number of fields, or attributes, and each field is usually of a fixed length records simplifies the physical-level implementation of the database. The three most widely accepted Record-Based data models are... The relation model The network model The hierarchical model -6- The Relational Model The relational model uses a collection of tables to represent both data and relationship among that data. Each table has multiple columns; each column has a unique name. Example: The below figure represents a sample relational database comprising of two tables, one shows bank customers, and the other shows the accounts that belong to those customers. It shows that the customer Johns with Socialsecurity number 321-12-3123 lives on main in Harrison, and has an account A-201 with balance of $500. Here the tables have a common column to link customers and their respective balance. CustomerName Johnson Johns Smith SocialSecurity 192-83-7465 321-12-3123 345-24-8153 CustomerStreet Alma Main Park AccountNumber A-101 A-201 A-305 CustomerCity Rye Harrison Stamford Balance $600 $500 $900 Fig: A sample relational database -7- AccountNumber A-101 A-201 A-305 The Network Model In the network model the tables have no common columns. Instead the relationships among data are represented by links, which can be viewed as pointers. The records in the database are organized as collection of arbitrary graphs. Johnson Johns Smith 192-83-7456 321-12-3123 345-24-8153 Alma Main Park Rye Harrison Stamford A-101 A-201 A-305 $600 $500 $900 Fig: A sample network database The Hierarchical Model Similar to the network model in the sense that data and relationship among them are represented by records and links, respectively, it differs from the network model in that the records are organized as collections of trees rather than arbitrary graphs. DATABASE Johnson 192-83-7456 Johnson Alma Rye 192-83-7456 Johnson A-101 $600 A-101 Alma Rye 192-83-7456 $600 Fig: A sample hierarchical model -8- Alma A-101 Rye $600 3-Physical Models Physical data models are used to describe data at the lowest level. In contrast to logical data models, there are few physical models in use. Tow of the widely ones are the unifying model and the frame-memory model. Database Language A database system provides two different types of languages, one to specify the database schema, and the other to express database queries and updates. Data Definition Language A database schema is specified by a set of definitions expressed by a special language called data definition language (DDL). The result of compilation of DDL statements is a set of tables that is stored in a special file called data dictionary or data directory. A data dictionary is a file that contains METADATA- that is data about data. This file is consulted before actual data are read or modified in the data base system. The storage structure and access methods used are also defined. Data Manipulation Language The language used for data abstraction and data manipulation is called data manipulation language (DML). Data manipulation includes The retrieval of information stored in database The insertion of new information into the database The deletion of information form the database The modification of information stored in the database Appropriate algorithms are defined to efficiently access the data and allow high level of data abstraction and human interaction with the database system. -9- Query Language A query is a statement requesting retrieval and manipulation of information that are registered in the database. The portion of the DML that involves such request is called Query language or structured query language (SQL) Application Programs These are programs that contain user instructions to interact with the database systems through calls to DML. Programming Languages like C, Pascal, Delphi, Visual Basic …etc are used to organize this user interacting in a user friendly environment. Transaction Management A transaction is a collection of operations that perform a single logical function in a database application. Example: In a banking system, if a fund transfer is to be made from account-A to account-B, then the amount to be transferred is incremented on account-B and decremented on account-A. Supposing that account-A balance was $300 and account-B balance was $100 prior to transfer operation. Account-A Account-B Transfer New Balances $300 -50 $250 - 10 - $100 +50 $150 The application program is responsible for forwarding both processes as an All – or – None bases. That is, either performs both calculations correctly and in full or don’t perform at all. This is called Atomicity. It is also essential that the execution of these processes preserve the database consistency. That is the accounts A&B will reflect the same effect after the fund transfer. Account-A Account-B Account-A + Account-B Before Execution $300 +$100 $400 After Execution $250 +$150 $400 The accounts will be consistent because the transfer operations updated both balances correctly and consistently. Therefore the database remains consistent. The Relational Model Structure The relational model has established it self as the primary data model for commercial data processing applications. A relational database consists of tables, each of which is assigned a unique name. A row in a table represents a relationship among set of values. Since a table is a collection of such relationships, there is a close correspondence between the concept of table and the mathematical concept of relations from which the relational data model tales its name. Consider the following banking enterprise representing a portion of total banking operation. Consider the account table Fig: 1.1, it has 3 column headers, Branch-Name, Account-Number and Balance. These headers are called attributes. For each attribute there is a set of permitted values called the domain of that attribute. For the attribute Branch-Name, the domain is all the Branch-Names. Let D1 denote this set, D2 denotes the set of all accountnumbers and D3 the set of all balances. Each row entry is called a tuple. Any raw in the table 1.1 is made of 3-Tuple entry where V1 is the Branch-Name in the domain D1, V2 is the Account-Number in the domain D2 and V3 is the Balance in the domain D3. - 11 - In general account will contain only a subset of the set of all possible rows. Therefore, account is a subset of D1 D2 D3 BranchName Downtown Mianus Perry ridge Round Hill Brighton Redwood Brighton AccountNumber A-101 A-215 A-102 A-305 A-201 A-222 A-217 Balance 500 700 400 350 900 700 750 Fig: 1.1 Account Relations The table shown in Fig: 1.1 is the relation and each row in the table is called a tuple. Database Schema The database schema denotes the relation schema for any given tables. It is the list of attributes and their corresponding domains. For example the Account-Schema = (Branch-Name, Account-Number, Balance) Continuing with banking example, we need to know where each branch is and the assets. Fig: 1.2 is another relation that shows Branch-Schema = (Branch-Name, Branch-City, Assets). Since we need customers we have to have customer relation Customer-Schema = (Customer-Name, Customer-Street, Customer-City) As shown in Fig: 1.3. We also need a relation to describe the association between customers and accounts. The relation schema to describe this association is shown in Fig: 1.4 as Depositor-Schema = (Customer-Name, Account-Number) - 12 - BranchName Downtown Redwood Perry ridge Mianus Round Hill Pownal North Town Brighton CustomerName Jones Smith Hayes Curry Lindsay Turner Williams Adams Johnson Glenn Brooks Green BranchCity Brooklyn Palo Alto Horse Neck Horse Neck Horse Neck Bennington Rye Brooklyn Fig: 1.2 Branch Relations CustomerStreet Main North Main North Park Putnam Nasser Spring Alma Sand Hill Senator Walnut Fig: 1.3 Customer Relations CustomerAccountName Number Johnson A-101 Smith A-215 Hayes A-102 Turner A-305 Johnson A-201 Jones A-217 Lindsay A-222 Fig: 1.4 Depositor Relations - 13 - Assets 9000000 2100000 1700000 400000 8000000 300000 3700000 7100000 CustomerCity Harrison Rye Harrison Rye Pittsfield Stamford Princeton Pittsfield Palo Alto Wood Side Brooklyn Stamford We include two additional relations to describe data about Loans maintained in the various branches of the bank Loan-Schema = (Branch-Name, Loan-Number, Amount) Borrower-Schema = (Customer-Name, Loan-Number) BranchName Downtown Redwood Perry ridge Downtown Mianus Round Hill Perry ridge LoanNumber L-17 L-23 L-15 L-14 L-93 L-11 L-16 Amount 1000 2000 1500 1500 500 900 1300 Fig: 1.5 Loan-Branch Relations CustomerName Jones Smith Hayes Jackson Curry Smith Williams Adams LoanNumber L-17 L-23 L-15 L-14 L-93 L-11 L-17 L-16 Fig: 1.6 Borrower Relations The banking enterprise we have described is derived from the E-R diagram shown in Fig: 1.7 - 14 - Account-No Balance Customer-City AccountBranch Borrower Loan LoanBranch Branch Branch-City Branch-Name Loan-No Assets Amount - 15 - Account Depositor Customer Customer-Name Customer-Street Fig: 1.7 E-R diagrams for the banking enterprise Keys It is important to be able to specify how an entity within an entity set or a relationship within a relationship set is distinguished. Keys allow us to make such distinction. Candidate Key One or more attributes taken collectively can identify uniquely an entity in an entity set. For example, the social-security attributes of the entity set customer is sufficient to distinguish one customer entity from another. Thus social-security is a candidate key. Customer-Name, Customer-Street & Customer-City collectively is another candidate key since it is highly unlikely that a second customer will have the same name, street & city. Primary Key The primary key is the candidate key that is chosen by the DBMS manager, to uniquely identity the entities within the entity set. The remaining candidate keys (if there are any) become the Alternate keys. In some cases the alternate keys are allowed to have duplicate values. Foreign Key When two or more tables (attribute sets) are linked together, the primary key is used to set up the relationship. The primary key of one set of attribute that is used in relating the other set of attributes is said to be the foreign key in the other set. For example, in Branch-schema, {Branch-Name} and {Branch-Name, Branch-City} are both candidate keys. They can both be primary keys {Branch-City} can not be candidate key because two different branches with different names. {Branch-Name} in the Loan relation is a foreign key since the Branch-Name is the primary key in the branch relation setting up the relationship between the two relations. The Relation Algebra The relation algebra is a procedural query language. It consists of a set of operations that takes one or more relations as input and produce a new relation as their result. The fundamental operations in the relational algebra - 16 - are SELECT, PROJECT, UNION, SET DIFFERENCE and CARTISIAN PRODUCT a) The Select Operation The select operation selects tuples that satisfy a given predicate. Sigma ( ) is used to denote the selection. Example: To select those tuples of the Loan relation where the branch is “Perry ridge” Branch-Name = “Perry ridge” (Loan) The result of the query is Branch-Name Loan-Number Amount Perry ridge L-15 1500 Perry ridge L-16 1300 Example: To find all tuples in which the amount lent is more than $1200. Amount > 1200 (Loan) Example: To find all tuples that has the Branch-Name “Perry ridge” and amount > 1200. Branch-Name = “Perry ridge” amount > 1200 (Loan) In general comparisons are carried out by using (=, ≠, <, >, ≤, ≥) in the selection predicate. Further more we can combine several predicates into longer predicates by using connectives AND ( ) & OR ( ). - 17 - b) The Project Operation The project operation will project (list) the named entities in the tuple and suppress the others. The projection is represented by pi ( ). Example: Loan-number, amount (Loan), will result Loan-Number L-17 L-23 L-15 L-14 L-93 L-11 L-16 Amount 1000 2000 1500 1500 500 900 1300 Composition of Relation Operations The result of a relational operation is also a relation. An expression (like arithmetical) can be used to evaluate a relation. Example: To project the customer-name that live in “Harrison” Customer-Name ( Customer-City = “Harrison”) (Customer) will result as Customer-Name Johns Hayes c) The Union Operation Two entities in the same or different tuples may be joined together as a single query. The ( ) character is used for uniting two queries into one. Example: To project all the customers with an account in the bank Customer-Name (Depositor) - 18 - To answer the query, we need the union of these two sets. Customer-Name (Borrower) Customer-Name (Depositor) Customer-Name Johnson Smith Hayes Turner Jones Lindsay Jackson Curry Williams Adams Notice that there are only 10 tuples in the result. The duplicate names are eliminated since all relations are also sets. Here Smith, Johns and Hayes are both Borrowers as well as Depositors. d) The Set Difference Operation The set difference operation allows us to find tuples that are in one relation and not in another. The ( − ) minus sign is used. Example: To find the customers of the bank who have an account but not a Loan. Customer-Name (Depositor) − Customer-Name (Borrower) Will result as Customer-Name Johnson Turner Lindsay - 19 - e) The Cartesian Product Operation The Cartesian product operation denoted by a cross ( ) allows us to combine information from any two relations. Example: The relation schema for r = Borrower Loan which gives (Borrower. customer-name, Borrower. Loan-number, Borrower. BranchName, Loan. Loan-number, Loan. amount) Since the Cartesian product will join every tuple of one relation to every tuple of other the resultant relation will be as below. CustomerName Jones Jones : : : Jones Smith Smith : : : Smith : : Borrower. Loan-number L-17 L-17 : : : L-17 L-23 L-23 : : : L-23 : : BranchName Downtown Redwood : : : Perry ridge Downtown Redwood : : : Perry ridge : : Loan. Loan-number L-17 L-23 : : : L-16 L-17 L-23 : : : L-16 : : Amount 1000 2000 : : : 1300 1000 2000 : : : 1300 : : Suppose that we want the names of all customers who have a Loan at Perry ridge branch. We need information in both the Loan relation and the Borrower relation. Branch-Name = “Perry ridge” (Borrower Loan) - 20 - Result: CustomerName Jones Jones Smith Smith : : : Adams Adams LoanNumber L-17 L-17 L-23 L-23 : : : L-16 L-16 BranchName Perry ridge Perry ridge Perry ridge Perry ridge : : : Perry ridge Perry ridge LoanNumber L-15 L-16 L-15 L-16 : : : L-15 L-16 Amount 1500 1300 1500 1300 : : : 1500 1300 Since the resulting relation shows duplicate Branch-Name, we take the relation with elimination of duplicate values as Borrower. Loan-Number = Loan. Loan-Number ( Branch-Name = “Perry ridge” (Borrower CustomerName Hayes Adams Loan)) LoanNumber L-15 L-16 BranchName Perry ridge Perry ridge LoanNumber L-15 L-16 Amount 1500 1300 And if we just want to project the names. Customer-Name ( ( Borrower. Loan-Number = Loan. Loan-Number Branch-Name = “Perry ridge” (Borrower Loan))) Customer-Name Hayes Adams - 21 - Additional Operations The fundamental operations of the relational algebra are sufficient to access any relational algebra query. However certain common queries are lengthy to express just using fundamental operations. Therefore some additional operations are defined. a) The Set-Intersection Operation The set-intersection operation denoted by ( ) selects the tuples of two or more relations intersection. Example: Suppose that we wish to find all customers who have both a Loan and an account using set difference operation Customer-Name (Depositor) – ( Customer-Name (Depositor) – Customer-Name (Borrower)) r r s Using set intersection we can write Customer-Name (Depositor) Customer-Name (Borrower) r s r r s The result of the query will be Customer-Name Hayes Jones Smith - 22 - b) The Natural-Join Operation The natural-join operation is a binary operation that allows us to combine certain selections and a Cartesian product into one operation. Example: Consider the query, "Find the name of all customers who have a Loan at the bank, and find amount of the Loan. Customer-Name, Loan.Loan-number, amount ( Borrower. Loan-Number = Loan. Loan-Number (Borrower Loan) The natural-join operation denoted by join ( ) forces equality on those attributes that appears in both relations and removes the duplicate attributes. Customer-Name, Loan-Number, Amount (Borrower Loan) The resulting table is CustomerName Jones Smith Hayes Jackson Curry Smith Williams Adams LoanNumber L-17 L-23 L-15 L-14 L-93 L-11 L-17 L-16 Amount 1000 2000 1500 1500 500 900 1000 1300 Example: Find the names of all branches with customers who have an account in the bank and who lives in Harrison. Branch-Name, ( Customer-City = "Harrison" (Customer Account - 23 - Depositor)) The result is Branch-Name Brighton Perry ridge Example: Find the customers who have both Loan and an account in the bank. Two possible expressions can be written for this example i) Customer-Name (Borrower Depositor) OR ii) Customer-Name (Borrower) Customer-Name (Depositor) Result in both cases … Customer-Name Hayes Jones Smith c) The Division Operation The division operation denoted by ( ) is suited to queries that include the phrase “For All”. Example: Suppose that we want to find all customers who have an account at all the branches located in Brooklyn. Branch-Name ( Branch-City = "Brooklyn "Branch)) => r1 Branch-Name Brighton Downtown This will give us all the branches in Brooklyn. - 24 - Then Customer-Name, Branch-Name (Depositor Account) => r2 This will give us customers and branches that customers have accounts in CustomerName Johnson Smith Hayes Turner Johnson Jones Lindsay BranchName Downtown Mianus Perry ridge Round Hill Brighton Brighton Redwood Hence r 2 r1 will result Customer-Name Johnson d) The Assignment Operation It is convenient at times to write a relational algebra in parts using assignment to a temporary relation variable. The assignment operation is denoted by ( ) as in assignment operation used in programming language. Example: r Customer-Name (Depositor) s Customer-Name (Borrower) selection = r r s - 25 - The Extended Relational-Algebra Operations The basic relational algebra operations have been extended in several ways. A simple extension is to allow arithmetic operations as part of projection. An important extension is to allow aggregate operation, such as computing the sum of the elements of a set or their average. Another important extension is the outer- join operation, which allows relational algebra expressions to deal with null values, which model missing information. a) Generalized Projection The generalized projection operation extends the projection operation by allowing arithmetic functions to be used in the projection list. The generalized projection operation has the form of F1, F2, F3… Fn (E) Where E is the relational-algebra expression, and each of F1, F2, F3… Fn are arithmetic expressions involving constants and attributes in the schema of E. Supposing that we have a relation credit-info as described in the following figure, which lists the credit limit and expenses credit-balance on the account CustomerName Jones Smith Hayes Curry Limit 6000 2000 1500 2000 CreditBalance 700 400 1500 1750 Fig: The credit info relation If we want to find out how much more each person can spend, we can write the following expression. Customer-Name, Limit – Credit-Balance (Credit-info) - 26 - The result will be Limit – CreditBalance 3500 1600 0 250 CustomerName Jones Smith Hayes Curry b) Outer-Join The outer-join operation is an extension of the join operation to deal with missing information supposes that we have the following schemas which contain data on full-time employees. Employee (employee-name, street, city) ft-works (employee-name, Branch-Name, salary) Let us consider the employee and ft-works relation shown in the following figures. EmployeeName Coyote Rabbit Smith Williams Street City Toon Tunnel Revolver Sea view Hollywood Carrot Ville Death Valley Seattle EmployeeName Coyote Rabbit Gates Williams BranchName Mesa Mesa Redwood Redwood Salary 1500 1300 5300 1500 Fig: The employee and ft-works relations Suppose that we want to generate a single relation with all the information (Street, City, Branch-Name and Salary) about fulltime - 27 - employees. A possible approach would be to use a natural join operation as following Employee-Name, Street, City, Branch-Name, Salary (Employee Ft-works) The result will be EmployeeName Coyote Rabbit Williams Street City Toon Tunnel Sea view Hollywood Carrot Ville Seattle Fig: Result of (employee BranchName Mesa Mesa Redwood Salary 1500 1300 1500 ft-works) Notice that we have lost the information on Smith since the tuple describing Smith is missing from the ft-work relation; similarly we have lost the information on Gates since the tuple describing Gates is missing from the employee relation. We can use outer-join operation to avoid this loss. There are three forms of the operation Left outer-join denoted ( Right outer-join denoted ( Full outer-join denoted ( Applying the left outer-join on the (employee shown in the following figure EmployeeName Coyote Rabbit Williams Smith Street City Toon Tunnel Sea view Revolver Hollywood Carrot Ville Seattle Death Valley Fig: The result of (employee - 28 - ) ) ) ft-works) … the result is BranchName Mesa Mesa Redwood Null ft-works) Salary 1500 1300 1500 Null The left outer-join takes all tuples in the left relation that did not match with any tuple in the right relation, pads the tuples with NULL values for all other attributes from other relation, and adds them to the result of the natural join. Tuple (Smith, Revolver, Death Valley, Null, Null) Similarly right outer-join of (employee following figure EmployeeName Coyote Rabbit Williams Gates ft-works) will result as the Street City Toon Tunnel Sea view Null Hollywood Carrot Ville Seattle Null Fig: Result of (employee The full outer-join of (employee EmployeeName Coyote Rabbit Williams Smith Gates BranchName Mesa Mesa Redwood Redwood Salary 1500 1300 1500 5300 ft-works) ft-works) will be Street City Toon Tunnel Sea view Revolver Null Hollywood Carrot Ville Seattle Death Valley Null Fig: Result of (employee BranchName Mesa Mesa Redwood Null Redwood Salary 1500 1300 1500 Null 5300 ft-work) c) Aggregate Functions Aggregate functions are functions that take a collection of values and return a single value as a result. i) Sum: Will return the some of all numerical attributes gives in the relation. Sum Salary (ft-works) The result is 9600. - 29 - ii) Distinct: There are cases where we must compute an aggregate function. We use the distinct statement with the function. Consider a part-time employee relation called pt-work as shown below. EmployeeName Johnson Lorena Peterson Sato Rao Gopal Adams Brown BranchName Downtown Downtown Downtown Austin Austin Perry ridge Perry ridge Perry ridge Salary 1500 1300 2500 1600 1500 5300 1500 1300 Fig: The pt-work relation If we want to count the number of branches in the relation using the aggregate function count, we want to avoid counting the duplicate Branch-Name. The expression giving the number of branches would be written as Count-Distinct Branch-Name (pt-work) The result will be 3. (Downtown, Austin, Perry ridge) iii) Grouping: There are circumstances where we would like to apply the aggregate function not only to a single set of tuples, but also to several groups, where each group is a set of tuples; we do so by using an operation called grouping. Example: We may want to find the total salary sum of all part-time employees at each branch of the bank individually, rather than in the entire bank. To do so we need to partition the relation pt-works into groups based on the branch and to apply the aggregate function on each group. The “G” operator achieves the desired result. Branch-Name G Sum Salary (pt-work) - 30 - The result is BranchName Downtown Austin Perry ridge iv) Salary 5300 3100 8100 Min & Max: These functions will return the minimum or maximum value in the selected column of the relation. Example: Employee-Name, Min Salary (pt-work) will return EmployeeName Lorena Brown Salary 1300 1300 Employee-Name, Max Salary (pt-work) will return EmployeeName Gopal Salary 5300 If we want to find the max salary in each group of Branch-Name Branch-Name, Salary ( Branch-Name G Max salary (pt-works)) The result is BranchName Downtown Austin Perry ridge Salary 2500 1600 5300 - 31 - v) Avg: This function returns the average value of the selected columns in the relation. Example: Avg Salary (ft-works) Will return 2400 since the sum of salaries in the ft-works is 9600 and average is 9600/4 giving 2400 The Modification of the Database The modification of the database involves adding and deleting and changing information in the database. We express database modifications using assignment operation. a) Deletion: A delete request is expressed in much the same way as query. However instead of displaying the tuple, we remove the selected tuple. The deletion expression is written as r rE Where r is a relation and E is the relational-algebra query. Example: To delete all Smith’s account Depositor Depositor − Customer-Name = “Smith” (Depositor) To delete all Loans with amounts in the range 0-50 Loan Loan − Amount ≥ 0 and amount ≤ 50 (Loan) To delete all account at branches located in Brooklyn r1 Branch-City = “Brooklyn” (Account Branch) r2 Branch-Name, Account-Number, Balance(r1) Account Account − r2 - 32 - b) Insertion: To insert data into a relation we either specify a tuple or write a query whose result is a set of tuples to be inserted. An insertion is expressed by r r E Where R is a relation and E is the relational-algebra expression. Suppose that we want to insert the data Smith who has 1200 $ in account A_973 at the Perry Ridge branch. Account Account {(“Perry ridge”, A973)} An example of inserting tuples based on the result of a query, suppose that as a gift we might want to opens a new account to all those customers has a Loan at the Perry ridge branch. Here the Loan numbers will be used as account numbers. r1 ( Branch-Name = “Parry ridge” (Borrower Loan) r2 Branch-Name, Loan-Number (r1) Account Account (r2 × {(200)}) Depositor Depositor Customer-Name, Loan-Number (r1) The result will be CustomerAccountName Number Hayes A-15 Adams A-16 Added to the Depositor relation as new customer And BranchAccountName Number Perry ridge A-15 Perry ridge A-16 Added to the accounts relation as new accounts - 33 - Balance 200 200 c) Updating: In Certain situations we may wish to change some of the values in the existing tuple. We use the generalized projection operator to do this task r F1, F2, F3… Fn (r) Example: Suppose that interest payments of 5 % are to be paid to all accounts. The balances will be updated by 5% increase. Account Branch-Name, Account-Number, Balance Balance × 1.05 (Account) Example: Suppose that balances over 2000 will receive 6 % where as all the others will receive 5%. Account Branch-Name, Account-Name, Balance Balance × 1.06 ( Balance > 2000 (Account)) Branch-Name, Account-Name, Balance Balance × 1.05 ( Balance ≤ 2000 (Account)) - 34 - Example: Consider the following DB tables that represent the stock control management system Stock Item Relation Each stock item is represented by its stock code, stock name, minimum stock level, sale price and amount in hand Sales Relation There are four branches of the company that sells the commonly defined product. The relation for sales includes date, branch numbers, stock code and quantity sold. Branches Relation The four branch of the firm involved in sales are listed with branch code, branch name, and town The management requires the following reports to be generated for evaluation. The reports will project the necessary information by evaluating the relational algebra statements Continue Stock Items: Stock Code MON 15 KEY 04 DIS 20 MB 100 CD 52 MS 10 CDWR SC 64 Stock Name 15” Monitor Keyboard 20 GB Disk Main Board 52×CD Rom Mouse 700 MB CD 64 MB VGA Min Stock Level 4 12 6 2 5 24 100 3 E1 Sale Price $150 $20 $90 $300 $60 $9 $0.8 $50 Stock in Hand 3 28 14 1 7 42 245 2 Sales Relation: Date 1/12/03 1/12/03 1/12/03 1/12/03 1/12/03 1/12/03 1/12/03 2/12/03 2/12/03 2/12/03 2/12/03 2/12/03 2/12/03 2/12/03 3/12/03 3/12/03 3/12/03 3/12/03 3/12/03 3/12/03 3/12/03 3/12/03 Branch Relation: Branch-Code B1 B2 B3 B4 BranchNumber B2 B3 B2 B1 B4 B1 B2 B2 B4 B1 B3 B4 B1 B2 B1 B2 B4 B3 B3 B2 B2 B1 B2 Stock Code MB 100 CD 52 CDWR CDWR CDWR KEY 04 MS 10 CD 52 DIS 20 SC 64 MON 15 CD 52 CDWR MB 100 CDWR CDWR CDWR CDWR KEY 04 KEY 04 MS 10 CD 52 KEY 04 Branch-Name Shopping Center Branch Kyrenia Str. Branch Famagusta Branch Harbor Branch E2 Quantity Sold 1 1 5 10 2 1 1 1 1 1 1 1 12 1 5 2 4 2 1 2 3 1 1 Town Nicosia Nicosia Famagusta Kyrenia Reports to be projected: a) b) c) d) e) f) g) h) i) j) k) The sum of all sales in al branches The sum of sales by branches The number sold from each stock item (over all) The number of “Keyboards” sold on 1/12/2003 The number of “700 MB CD” sold between the dates 1/12/2003 and 3/12/2003 The number of “700 MB CD” sold in Nicosia The number of “CD 52” sold in “Harbor Branch” The items that are below Min Stock Level The over all sales in Nicosia The number of sales in Shopping Center Branch who’s price is below 40$ The items sold in Famagusta & Harbor Branch E3 STRUCTURED QUERY LANGUAGE (SQL): SQL is a transform-oriented and non-procedure language designed to use relations to transform inputs into required outputs. SQL has two major components A Data Definition Language (DDL) for defining the database structure A Data Manipulation Language (DML) for retrieving and updating data SQL contains only these definitional and manipulative commands. It does not contain flow control commands. There is no IF ... THEN … ELSE, GOTO, WHILE … DO or other commands to provide a flow control. Due to this lock of computational completeness, SQL can be used in two ways: Use SQL interactively by entering the statements at the terminal. Embed SQL statements in a procedural language. SQL DML Statements Data manipulation statements in SQL are SELECT To query data in the database INSERT To insert data into a table UPDATE To update data in table DELETE To delete data from table The SELECT Statement The purpose of the SELECT statement is to retrieve and display data from one or more database tables. It is extremely powerful command capable of performing the equivalent of the relational algebra’s selection, projection and join in a single statement. The general format of the SELECT statement is: SELECT FROM [WHERE [GROUP BY [ORDER BY [DISTENCT/ALL] {*/[Column – Expression]} Table name(s) condition] Column – list][HAVING Condition] Column – list] SELECT FROM WHERE GROUP BY HAVING ORDER BY Specifies which columns are to appear in the output Specifies table(s) to be used Filters the rows subject to some conditions Forms groups of rows with the same column value Filters the groups subject to some conditions Specifies the order of the output The order of the clauses in the SELECT statement cannot be changed. The only mandatory clauses are SELECT and FROM, the remainders are optional Considering the stock control DB, the SELECT statement can be used to retrieve data as shown below. i) Retrieve all columns. All rows in the stocks SELECT FROM Stock-Code, Stock-Name, Stock-Min-Level, Stock-Price, Stock-Quantity Stock ; OR SELECT * FROM Stock ; The result is Stock Code MON 15 KEY 04 DIS 20 MB 100 CD 52 MS 10 CDWR SC 64 Stock Name 15” Monitor Keyboard 20 GB Disk Main Board 52×CD Rom Mouse 700 MB CD 64 MB VGA Stock Min Level 4 12 6 2 5 24 100 3 -1- Sale Price $150 $20 $90 $300 $60 $9 $0.8 $50 Stock in Hand 3 28 14 1 7 42 245 2 ii) Retrieve specific columns. All rows in stocks Example: List all rows of Stocks by Stock-Name, Stock-Price SELECT FROM Stock-Name, Stock-Price Stock ; The result is: Stock Name 15” Monitor Keyboard 20 GB Disk Main Board 52×CD Rom Mouse 700 MB CD 64 MB VGA iii) StockPrice $150 $20 $90 $300 $60 $9 $0.8 $50 Retrieve specific columns. Unique rows in sales Example: List all Branch-Number in sales SELECT DISTINCT FROM Stock ; Branch-Number The result is: BranchNumber B2 B3 B1 B4 -2- iv) Retrieve all columns. All rows with calculated fields Example: List all the stocks with current value in hand; list Stock-Code, StockValue SELECT FROM Stock-Code, Stock-Price * Stock-Quantity Stocks ; The result is: Stock Code MON 15 KEY 04 : : SC 64 Col 2 $450 $560 : : $100 The row selection: Very often certain search condition is imposed on the rows to restrict the selection process. WHERE clause is used for setting up a search condition. There are 5 basic search conditions: Comparison Range Set Membership Pattern Match Null Compare the value of one expression to the value of another expression Test whether the value of an expression falls within a specified range of value Test whether the value of an expression equals one of a set of values Test whether a string matches a specified pattern Tests whether a column has a Null (Unknown) value -3- v) Compression. Search condition Example: List all stocks whose price is greater than $90 by Stock-Code, StockName, Stock-Price SELECT FROM WHERE Stock-code, Stock-Name, Stock-Price Stock Stock-Price > 90 ; The result is: StockCode MON 15 MB 100 StockName 15” Monitor Main Board StockPrice $150 $300 In conditional statements the following logical operators are used = Equals < Less than > Greater than ≤ Less than or equal ≥ Greater than or equal <> Not equal to More complex predicates can be generated using AND, OR and NOT. The rule of evaluation such conditional expressions are as follow. An expression is evaluated left to right Expression inserted in brackets are evaluated first NOTs are evaluated before ANDs and Ors ANDs are evaluated before ORs -4- vi) Compound comparison. Search condition Example: List all sales of 64 MB VGA or Keyboard by Sales-Date, BranchNumber, Stock-Code SELECT Sales-Date, Branch-Number, Stock-Code FROM Sales WHERE Stock-Code = “SC 64” OR Stock-Code = “KEY 04” ; The result is: BranchNumber B1 B1 B3 B2 Date 1/12/03 1/12/03 3/12/03 3/12/03 vii) Stock Code KEY 04 SC 64 KEY 04 KEY 04 Range search condition. (BETWEEN / NOT BETWEEN) Example: List all stocks whose Stock-Quantity is between 5 and 50 by StockCode, Stock-Quantity. SELECT FROM WHERE Stock-Code, Stock-Quantity Stocks Stock-Quantity BETWEEN 5 AND 50 ; The result is: StockCode KEY 04 DIS 20 CD 52 MS 10 StockSold 28 14 7 42 -5- viii) Set membership search condition. (IN / NOT IN) Example: List Branch-Name in Nicosia and Kyrenia SELECT From WHERE Branch-Name, Branch-Town Branch Branch-Town IN (“Nicosia”, “Kyrenia”) ; The result is: Branch-Name Shopping Center Branch Kyrenia Str. Branch Harbor Branch Branch-Town Nicosia Nicosia Kyrenia ix) Pattern match search condition. (LIKE / NOT LIKE). In pattern match search condition, the string to be searched for can be any portion taken from any character position with any length between 1 & n (n is the length of the string to search for). % sign is used for a wild character and underscore ( _ ) is used for a single character For example: Address LIKE ‘H%’ means the first character must be “H” but the rest of the string can be any thing Address LIKE ‘H_ _ _ _’ means that there must be exactly 4 characters in the string. First of which must be “H” Address LIKE ‘%e’ means any sequence of characters of length at least 1 with the last character an “e” Address LIKE ‘%Nicosia%’ means a sequence of characters of any length containing “Nicosia” Address ‘H%’ means the first character cannot be “H” NOT LIKE Example: List all the stocks whose Stock-Name starts wit “M”, by Stock-Code, Stock-Price SELECT FROM WHERE Stock-Code, Stock-Name, Stock-Price Stocks Stock-Code LIKE “M%” ; -6- The result is: StockCode MB 100 MS 10 x) StockPrice $300 $9 Null search condition. (IS NULL/IS NOT NULL) Supposing that we have an entry in the Sales that don’t have a date, the blank inserted in the date field would have Null value and not ( Ø ) or " ". There fore we can't test it against these values. Example: List the entries in the Sales by Branch-Number, Stock-Code where there is no date entry. SELECT FROM WHERE Stock-Code, Branch-Number Sales Date IS Null ; The result is: StockCode KEY 04 xi) BranchNumber B2 Sorting the results. (ORDER BY) The sorting may be in ASC (ascending) or DESC (descending) order. The sorting may be used on single column or multiple columns Example: List the branches in ASC order of Branch-town. By branch-code, Branch-Name, branch-town. SELECT FROM ORDER BY Branch-Number, Branch-Name, Branch-Town Branch Branch-Town ASC -7- The result is: Branch-Code B3 B4 B1 B2 Branch-Name Famagusta Branch Harbor Branch Shopping Center Branch Kyrenia Str. Branch Town Famagusta Kyrenia Nicosia Nicosia SQL Aggregate Functions: COUNT: SUM: AVG: MIN: MAX: Returns the number of values in specific column Returns the sum of values in specific column Returns the average of values in specific column Returns the smallest of values in specific column Returns the largest of values in specific column i) COUNT: Example: Count the number of entries in the stock relation. SELECT COUNT FROM (Stock-Code) as count Stock; The result count = 8 ii) SUM: Example: Sum the Stock-Quantity in the brand of those items whose price is less than $50. SELECT FROM WHERE Sum (Stock-Quantity) as sum Stock Stock-Price < $50; The result sum = 28 + 42 + 245 = 315 -8- iii) MAX: Example: Find the Stock item with the maximum sale price in the Stocks. SELECT FROM Stock-Name, MAX (Stock-Price) Stocks ; The result is StockName Main Board StockPrice $300 iv) MIN: Example: Find the stock item with the minimum sale price in the Stocks. SELECT FROM Stock-Name, MIN (Stock-Price) Stocks ; The result is: StockName 700 MB CD StockPrice $0.8 v) AVG: Example: Find the average stock sales on 1/12/2003. SELECT AVG FROM WHERE (Sales-Quantity) as avg Sales Sales-Date = '1/12/2003'; The result avg = 3. (21/7) -9- The INSERT Statement: Insert statement is used to add new entries to an existing DB table. The format of the INSERT statement is as follows, INSERT INTO VALUES <Table-Name> [(column-list)] (data-value-list) Example: Insert a new entry into the stock table as below: Stock Code DIS 35 Stock Name 3.5” FDD INSERT INTO VALUES Stock Min Level 3 Sale Price $30 Stock in Hand 5 Stock (Stock-Code, Stock-Name, Stock-Min-Level, Stock-price, Stock-Quantity) ('DIS 35', '3.5"FDD', 3, 30, 5); If all data fields are to be inserted with data, then column headings might not be listed INSERT INTO VALUES Stock ('DIS 35', '3.5"FDD', 3, 30, 5); The UPDATE Statement: The existing DB table entry can be modified using UPDATE statement. UPDATE statement will not change the contents of the primary key. The format is: UPDATE SET [WHERE <Table-name> column-name1 = data-value1, column-name2 = data-value2, column name3 = data-name3. search-condition] ; - 10 - Example: If the firm decides to give all the items a price increase of 10%, then the stock table should be updated as below. UPDATE SET Stock Sale-Price = Sale-Price * 1.1 ; As a result the stock table will look as below Stock Code MON 15 : : SC 64 Stock Name 15” Monitor : : 64 MB VGA Stock Min Level 4 : : 3 Sale Price $165 : : $55 Stock in Hand 3 : : 2 The DELETE Statement: The DELETE statement allows rows to be deleted from the specified table. The format is: DELETE FROM <Table-name> [WHERE search-condition] ; Example: To delete all rows in the stocks DELETE FROM Stock ; To delete selected rows from the table DELETE FROM Stock WHERE Stock-Price < 20 ; 'MSIO' and 'CDWR' will be deleted from stock table - 11 -