lOMoARcPSD|25829564 Dbms complete notes-bca BCA (University of Mysore) Studocu is not sponsored or endorsed by any college or university Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Database A database is a collection of related data. Data - known facts that can be recorded and that have implicit meaning. Example names, telephone numbers, and addresses of the people you know. - This collection of related data with an implicit meaning is a database. A database has the following properties: ■ A database represents some aspect of the real world, sometimes called the miniworld or the universe of discourse (UoD). Changes to the miniworld are reflected in the database. ■ A database is a logically coherent collection of data with some inherent meaning. A random assortment of data cannot correctly be referred to as a database. ■ A database is designed, built, and populated with data for a specific purpose. It has an intended group of users and some preconceived applications in which these users are interested. ■Changes must be reflected in the database as soon as possible. ■A database can be of any size and complexity Database management system (DBMS) A database management system (DBMS) is a collection of programs that enables users to create and maintain a database. The DBMS is a general-purpose software system that facilitates the processes of defining, constructing, manipulating, and sharing databases among various users and applications. Defining – specifying the data types, structures, and constraints of the data to be stored in the database. Meta-data. The database definition or descriptive information is also stored by the DBMS in the form of a database catalog or dictionary; it is called meta-data. Constructing-the database is the process of storing the data on some storage medium that is controlled by the DBMS. Manipulating- a database includes functions such as querying the database to retrieve specific data, updating the database to reflect changes in the miniworld, and generating reports from the data. Sharing- a database allows multiple users and programs to access the database simultaneously. Other important functions provided by the DBMS include, Protecting- the database and maintaining it over a long period of time. Protection includes system protection against hardware or software malfunction (or crashes) and security protection against unauthorized or malicious access. Dept of Computer Science Page 1 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Maintaining- A typical large database may have a life cycle of many years, so the DBMS must be able to maintain the database system by allowing the system to evolve as requirements change over time. An application program accesses the database by sending queries or requests for data to the DBMS. A query typically causes some data to be retrieved; A transaction may cause some data to be read and some data to be written into the database. Traditional database applications-in which most of the information that is stored and accessed is either textual or numeric. Multimedia databases- Possible to store images, audio clips, and video streams digitally. Geographic information systems (GIS) -can store and analyze maps, weather data, and satellite images. Data warehouses and online analytical processing (OLAP) systems -are used in many companies to extract and analyze useful business information from very large databases to support decision making. Real-time and active database technology -issued to control industrial and manufacturing processes. Characteristics of the Database Approach (or The main characteristics of the database approach versus the file-processing) The main characteristics of the database approach versus the file-processing approach are the following: ■ Self-describing nature of a database system ■ Insulation between programs and data, and data abstraction ■ Support of multiple views of the data ■ sharing of data and multiuser transaction processing 1.Self-Describing Nature of a Database System A fundamental characteristic of the database approach is that the database system contains not only the database itself but also a complete definition or description of the database structure and constraints. This definition is stored in the DBMS catalog, which contains information such as the structure of each file, the type and storage format of each data item, and various constraints on the data. The information stored in the catalog is called meta-data, and it describes the structure of the primary database Dept of Computer Science Page 2 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Insulation between Programs and Data, and Data Abstraction In traditional file processing, the structure of data files is embedded in the application programs, so any changes to the structure of a file may require changing all programs that access that file. By contrast, DBMS access programs do not require such changes in most cases. The structure of data files is stored in the DBMS catalog separately from the access programs. We call this property program-data independence. Support of Multiple Views of the Data A database typically has many users, each of whom may require a different perspective or view of the database. A view may be a subset of the database or it may contain virtual data that is derived from the database files but is not explicitly stored. Sharing of Data and Multiuser Transaction Processing A multiuser DBMS, as its name implies, must allow multiple users to access the database at the same time. This is essential if data for multiple applications is to be integrated and maintained in a single database. The DBMS must include concurrency control software to ensure that several users trying to update the same data do so in a controlled manner so that the result of the updates is correct. For example, when several reservation agents try to assign a seat on an airline flight, the DBMS should ensure that each seat can be accessed by only one agent at a time for assignment to a passenger. These types of applications are generally called online transaction processing (OLTP) applications. A fundamental role of multiuser DBMS software is to ensure that concurrent transactions operate correctly and efficiently. Advantages of Using the DBMS Approach (OR Additional characteristics of database systems) The advantages of using a DBMS and the capabilities that a good DBMS should possess are the following. These capabilities are in addition to the four main characteristics discussed already. The DBA must utilize these capabilities to accomplish a variety of objectives related to the design, administration, and use of a large multiuser database. 1. Controlling Redundancy Redundancy in storing the same data multiple times leads to several problems. First, there is the need to perform a single logical update—such as entering data on a new student—multiple times: once for each file where student data is recorded. This leads to duplication of effort. Second, storage space is wasted when the same data is stored repeatedly, and this problem may be serious for large databases. Third, files that represent the same data may become inconsistent. Dept of Computer Science Page 3 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE In the database approach, the views of different user groups are integrated during database design. Ideally, we should have a database design that stores each logical data item—such as a student’s name or birth date—in only one place in the database. This is known as data normalization, and it ensures consistency and saves storage space .It is sometimes necessary to use controlled redundancy to improve the performance of queries. 2. Restricting Unauthorized Access When multiple users share a large database, it is likely that most users will not be authorized to access all information in the database. A DBMS should provide a security and authorization subsystem, which the DBA uses to create accounts and to specify account restrictions. 3. Providing Persistent Storage for Program Objects Databases can be used to provide persistent storage for program objects and data structures. This is one of the main reasons for object-oriented database systems. Programming languages typically have complex data structure. The persistent storage of program objects and data structures is an important function of database systems. 4. Techniques for Efficient Query Processing Database systems must provide capabilities for efficiently executing queries and updates. Because the database is typically stored on disk, the DBMS must provide specialized data structures and search techniques to speed up disk search for the desired records. Auxiliary files called indexes are used for this purpose. 5. Providing Backup and Recovery A DBMS must provide facilities for recovering from hardware or software failures. The backup and recovery subsystem of the DBMS is responsible for recovery. 6. Providing Multiple User Interfaces Because many types of users with varying levels of technical knowledge use a database, a DBMS should provide a variety of user interfaces. These include query languages for casual users, programming language interfaces forapplication programmers, forms and command codes for parametric users, and menu-driven interfaces and natural language interfaces for standalone users. Both forms-styles interfaces and menu-driven interfaces are commonly known as graphical user interfaces (GUIs). 7. Representing Complex Relationships among Data A database may include numerous varieties of data that are interrelated in many ways. A DBMS must have the capability to represent a variety of complex relationships among the data, to define new relationships as they arise, and to retrieve and update related data easily and efficiently. 8. Enforcing Integrity Constraints Dept of Computer Science Page 4 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Most database applications have certain integrity constraints that must hold for the data. A DBMS should provide capabilities for defining and enforcing these constraints. The simplest type of integrity constraint involves specifying a data type for each data item. 9. Permitting Inferencing and Actions Using Rules Some database systems provide capabilities for defining deduction rules for inferencing new information from the stored database facts. Such systems are called deductive database systems. In today’s relational database systems, it is possible to associate triggers with tables. A trigger is a form of a rule activated by updates to the table, which results in performing some additional operations to some other tables, sending messages, and so on. 10. Additional Implications of Using the Database Approach This section discusses some additional implications of using the database approach that can benefit most organizations. a. Potential for Enforcing Standards. b. Reduced Application Development Time. c. Flexibility. d. Availability of Up-to-Date Information. e. Economies of Scale Advantages and disadvantages of DBMS over conventional file system Advantages • Flexibility: Because programs and data are independent, programs do not have to be modified when types of unrelated data are added to or deleted from the database, or when physical storage changes. • Fast response to information requests: Because data are integrated into a single database, complex requests can be handled much more rapidly than if the data were located in separate, non-integrated files. In many businesses, faster response means better customer service. • Multiple accesses: Database software allows data to be accessed in a variety of ways and often, by using several programming languages. • Lower user training costs: Users often find it easier to learn such systems and training costs may be reduced. Also, the total time taken to process requests may be shorter, which would increase user productivity. • Less storage: Theoretically, all occurrences of data items need be stored only once, thereby eliminating the storage of redundant data. System developers and database designers often use data normalization to minimize data redundancy. Dept of Computer Science Page 5 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Disadvantages 1. Database systems are complex, difficult, and time-consuming to design 2. Substantial hardware and software start-up costs 3. DBMS subjects a business to a risk in the loss of critical data in its electronic format it can be more readily stolen without proper security 4. The cost of a DBMS can be prohibitive for small enterprises as they struggle with cost justification for making the investment in the infrastructure 5. Improper use of the DBMS can lead to incorrect decision making as people take for granted the data is accurate as presented . 6. Data can be stolen by careless of the password security. Problems with File Processing Systems 1. Program-Data Dependence- File descriptions are stored within each application program that accesses a given file. 2. Duplication of Data- Applications are developed independently in file processing systems leading to unplanned duplicate files. Duplication is wasteful as it requires additional storage space and changes in one file must be made manually in all files. This also results in loss of data integrity. It is also possible that the same data item may have different names in different files, or the same name may be used for different data items in different files. 3. Limited data sharing- Each application has its own private files with little opportunity to share data outside their own applications. A requested report may require data from several incompatible files in separate systems. 4. Lengthy Development Times-There is little opportunity to leverage previous development efforts. Each new application requires the developer to start from scratch by designing new file formats and descriptions 5. Excessive Program Maintenance- The preceding factors create a heavy program maintenance load. 6. Integrity Problem. The problem of integrity is the problem of ensuring that the data in the database is accentuate. 7. Inconsistence data 8. Security -The Problem of security in file processing is Non Programmer can retrieve modify, delete and insert the data in data base. But this is not possible in DBMS. DBMS classification criteria: We can classify the DBMS based on the following criteria, - Data model, number of users, number of sites, access paths, cost Dept of Computer Science Page 6 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Data model • Relational • Object • Hierarchical and network (legacy) • Native XML DBMS Number of users • Single-user • Multiuser Number of sites • Centralized • Distributed Homogeneous Heterogeneous Cost • Open source • Different types of licensing Types of access path options General or special-purpose Actors on the Scene (Users of Database) Actors on the scene -The people whose jobs involve the day-to-day use of a large database 1. Database Administrators In any organization where many people use the same resources, there is a need for a chief administrator to oversee and manage these resources. In a database environment, the primary resource is the database itself, and the secondary resource is the DBMS and related software. Administering these resources is the responsibility of the database administrator (DBA). The DBA is responsible for authorizing access to the database, coordinating and monitoring its use, and acquiring software and hardware Dept of Computer Science Page 7 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE resources as needed. The DBA is accountable for problems such as security breaches and poor system response time. In large organizations, the DBA is assisted by a staff that carries out these functions. 2. Database Designers Database designers are responsible for identifying the data to be stored in the database and for choosing appropriate structures to represent and store this data. These tasks are mostly undertaken before the database is actually implemented and populated with data. It is the responsibility of database designers to communicate with all prospective database users in order to understand their requirements and to create a design that meets these requirements. In many cases, the designers are on the staff of the DBA and may be assigned other staff responsibilities after the database design is completed. 3 .End Users End users are the people whose jobs require access to the database for querying, updating, and generating reports. There are several categories of end users: ■ Casual end users occasionally access the database, but they may need different information each time. Eg:- middle- or high-level managers, other occasional browsers. ■ Naive or parametric end users constantly querying and updating the database, using standard types of queries and updates—called canned transactions—that have been carefully programmed and tested. eg:- Bank tellers check account balances and post withdrawals and deposits. eg:- Reservation agents for airlines, hotels, and car rental companies check availability for a given request and make reservations. ■ Sophisticated end users include engineers, scientists, business analysts, and others who thoroughly familiarize themselves with the facilities of the DBMS in order to implement their own applications to meet their complex requirements. ■ Standalone users maintain personal databases by using ready-made program packages that provide easy-to-use menu-based or graphics-based interfaces. Eg:- the user of a tax package that stores a variety of personal financial data for tax purposes. 4. System Analysts and Application Programmers (Software Engineers) System analysts determine the requirements of end users, especially naive and parametric end users, and develop specifications for standard canned transactions that meet these requirements. Application programmers implement these specifications as programs; then they test, debug, document, and maintain these canned transactions. Such analysts and programmers—commonly referred to as software developers or software engineers—should be familiar with the full range of capabilities provided by the DBMS to accomplish their tasks. The functions of a Database administrator are, Dept of Computer Science Page 8 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE - Schema Definition - Storage structure and access method definition - Schema and physical organization modification. - Granting authorization for data access. - Routine Maintenance Schema Definition The Database Administrator creates the database schema by executing DDL statements. Schema includes the logical structure of database table (Relation) like data types of attributes, length of attributes, integrity constraints etc. Storage structure and access method definition Database tables or indexes are stored in the following ways: Flat files, Heaps, B+ Tree etc. Schema and physical organization modification The DBA carries out changes to the existing schema and physical organization. Granting authorization for data access The DBA provides different access rights to the users according to their level. Ordinary users might have highly restricted access to data, while you go up in the hierarchy to the administrator, you will get more access rights. Routine Maintenance Some of the routine maintenance activities of a DBA is given below. Taking backup of database periodically Ensuring enough disk space is available all the time. Monitoring jobs running on the database. Ensure that performance is not degraded by some expensive task submitted by some users. Performance Tuning Data Models One fundamental characteristic of the database approach is that it provides some level of data abstraction. Data abstraction generally refers to the suppression of details of data organization and storage, and the highlighting of the essential features for an improved understanding of data. One of the main characteristics of the database approach is to support data abstraction so that different users can perceive data at their preferred level of detail. A data model—a collection of concepts that can be used to describe the structure of a database—provides the necessary means to achieve this abstraction. By structure of a database we mean the data types, relationships, and constraints that apply to the data. Dept of Computer Science Page 9 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Categories of Data Models We can categorize according to the types of concepts they use to describe the database structure. 1. High-level or conceptual data models provide concepts that are close to the way many users perceive data. Conceptual data models use concepts such as entities, attributes, and relationships. An entity represents a real-world object or concept, such as an employee or a project from the miniworld that is described in the database. An attribute represents some property of interest that further describes an entity, such as the employee’s name or salary. A relationship among two or more entities represents an association among the entities, for example, a works-on relationship between an employee and a project. Entity-Relationship model—a popular high-level conceptual data model. 2. Low-level or physical data models provide concepts that describe the details of how data is stored on the computer storage media, typically magnetic disks. Concepts provided by low-level data models are generally meant for computer specialists, not for end users. Physical data models describe how data is stored as files in the computer by representing information such as record formats, record orderings, and access paths. An access path is a structure that makes the search for particular database records efficient. An index is an example of an access path that allows direct access to data using an index term or a keyword. 3. Representational (or implementation) data models, which provide concepts that may be easily understood by end users but that are not too far removed from the way data is organized in computer storage. Representational data models hide many details of data storage on disk but can be implemented on a computer system directly. Representational data models represent data by using record structures and hence are sometimes called record-based data models. 4. Object data model as an example of a new family of higher-level implementation data models that are closer to conceptual data models. A standard for object databases called the ODMG object model has been proposed by the Object Data Management Group (ODMG).Object data models are also frequently utilized as high-level conceptual models, particularly in the software engineering domain. Categories of data models Conceptual (high-level, semantic) data models: Provide concepts that are close to the way many users perceive data. (Also called entity-based or object-based data models.) Dept of Computer Science Page 10 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Physical (low-level, internal) data models: Provide concepts that describe details of how data is stored in the computer. Implementation (representational) data models: Provide concepts that fall between the above two, balancing user views with some computer storage details. Types of common Data Models: 1. Hierarchical: The hierarchical database model rigidly structures data into an inverted “tree” in which each record contains two elements. The first is a single root or master field, often called a key, which identifies the type location or ordering of the record. The second is a variable number of subordinate fields, which define the rest of the data within a record. 2. Network: The network database model creates relationships among data through a linked-list structure in which subordinate records (called members) can be linked to more than one data elements (called an owner). 3.Relational: The relational database model is based on the simple concept of flat tables consisting of rows and columns. In this model, a table is equivalent to a file, row is equivalent to a record, and each column is equivalent to a field. Model Hierarchical database Network database Relational database Advantages Searching is fast and efficient. Disadvantages Conceptual simplicity is its characteristic. There are no predefined relationships among data. High flexibility in ad hoc querying. New data and records can be added easily. Many more relationships This is the most complicated between data elements can be model to design, implement, defined. There is greater speed and maintain. It has greater and efficiency than relational query flexibility than database models. hierarchical model, but less than relational model. Conceptual simplicity is its Processing efficiency and characteristic. There is no speed is lower. Data predefined relationships among redundancy cannot be data. High flexibility in ad hoc eliminated completely. Dept of Computer Science Page 11 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE querying. New data and records can be added easily. Emerging Data Models: 1. Relational Multidimensional Database: Used for building Data warehouse. 2. Object-Oriented: Consists of inter-related objects, similar to entities consisting of attributes, methods associated with the object, and its behavior). 3. Hypermedia: Stores chunks information in the form of nodes, for which links are established by the user, as per specific needs. 4. Geographical Information Database: Used for managing locational data for overlaying on maps and images. 5. Knowledge Database: Consists of decision rules used to evaluate situations and help users to take decisions like an expert. Schemas, Instances, and Database State Database schema:- In any data model, it is important to distinguish between the description of the database and the database itself. The description of a database is called the database Schema, which is specified during database design and is not expected to change frequently. Schema diagram :- Most data models have certain conventions for displaying schemas as diagrams. A displayed schema is called a schema diagram. Figure shows a schema diagram. Eg:-fig-Schema diagram for the database STUDENT Name Student _number COURSE Course_name SECTION Section_identifier Course_number Course_number Semester Class Major Credit_hours Department Year Instructor Schema construct :-each object in the schema—such as STUDENT or COURSE—a schema construct. A schema diagram displays only some aspects of a schema, such as Dept of Computer Science Page 12 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE the names of record types and data items, and some types of constraints. Other aspects are not specified in the schema diagram. Database state:- The actual data in a database may change quite frequently. For example, the database changes every time we add a new student or enter a new grade in student database. The data in the database at a particular moment in time is called a database state or snapshot. It is also called the current set of occurrences or instances in the database. The distinction between database schema and database state is very important. The schema is sometimes called the intension, and a database state is called an extension of the schema. Although, as mentioned earlier, the schema is not supposed to change frequently, it is not uncommon that changes occasionally need to be applied to the schema as the application requirements change. For example, we may decide that another data item needs to be stored for each record in a file, such as adding the Date_of_birth to the STUDENT schema . This is known as schema evolution. Three-Schema Architecture Three of the four important characteristics of the database approach are (1) use of a catalog to store the database description (schema) so as to make it self-describing, (2) insulation of programs and data (program-data and program-operation independence), and (3) support of multiple user views. In this section we specify architecture for database systems, called the three-schema architecture, that was proposed to help achieve and visualize these characteristics. The Three-Schema Architecture The goal of the three-schema architecture is to separate the user applications from the physical database. In this architecture, schemas can be defined at the following three levels: 1. The internal level has an internal schema, which describes the physical storage structure of the database. The internal schema uses a physical data model and describes the complete details of data storage and access paths for the database. 2. The conceptual level has a conceptual schema, which describes the structure of the whole database for a community of users. The conceptual schema hides the details of physical storage structures and concentrates on describing entities, data types, relationships, user operations, and constraints. Usually, a representational data model is used to describe the conceptual schema when a database system is implemented. This implementation conceptual schema is often based on a conceptual schema design in a high-level data model. Dept of Computer Science Page 13 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE 3. The external or view level includes a number of external schemas or user views. Each external schema describes the part of the database that a particular user group is interested in and hides the rest of the database from that user group. As in the previous level, each external schema is typically implemented using a representational data model, possibly based on an external schema design in a high-level data model. Fig: The three-schema architecture The processes of transforming requests and results between levels are called mappings. Data Independence The three-schema architecture can be used to further explain the concept of data independence, which can be defined as the capacity to change the schema at one level of a database system without having to change the schema at the next higher level. We can define two types of data independence: 1. Logical data independence is the capacity to change the conceptual schema without having to change external schemas or application programs. We may change the conceptual schema to expand the database (by adding a record type or data item), to change constraints, or to reduce the database (by removing a record type or data item). In the last case, external schemas that refer only to the remaining data should not be affected. Only the view definition and the mappings need to be changed in a DBMS that supports logical data independence. After the conceptual schema undergoes a logical reorganization, application programs that reference the external schema constructs must work as before. Changes to constraints can be applied to the conceptual schema without affecting the external schemas or application programs. 2. Physical data independence is the capacity to change the internal schema without having to change the conceptual schema. Hence, the external schemas need not be Dept of Computer Science Page 14 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE changed as well. Changes to the internal schema may be needed because some physical files were reorganized—for example, by creating additional access structures—to improve the performance of retrieval or update. If the same data as before remains in the database, we should not have to change the conceptual schema. For example, providing an access path to improve retrieval speed of section records by semester and year should not require a query such as list all sections offered in fall 2008 to be changed, although the query would be executed more efficiently by the DBMS by utilizing the new access path. Data independence occurs because when the schema is changed at some level, the schema at the next higher level remains unchanged; only the mapping between the two levels is changed. Hence, application programs referring to the higher-level schema need not be changed. DBMS Languages 1. Data definition language (DDL): After designing a database we have to choose a DBMS to implement the database. Then we have is to specify conceptual and internal schemas for the database and any mappings between the two. The data definition language (DDL), is used by the DBA and by database designers to define both schemas. The DBMS will have a DDL compiler whose function is to process DDL statements in order to identify descriptions of the schema constructs and to store the schema description in the DBMS catalog. In DBMSs where a clear separation is maintained between the conceptual and internal levels, the DDL is used to specify the conceptual schema only. 2. Storage definition language (SDL):- is used to specify the internal schema. The mappings between the two schemas may be specified in either one of these languages. In most relational DBMSs today, there is no specific language that performs the role of SDL. Instead, the internal schema is specified by a combination of functions, parameters, and specifications related to storage. These permit the DBA staff to control indexing choices and mapping of data to storage. 3. View definition language (VDL):-to specify user views and their mappings to the conceptual schema, but in most DBMSs the DDL is used to define both conceptual and external schemas. In relational DBMSs, SQL is used in the role of VDL to define user or application views as results of predefined queries. 4. Data manipulation language(DML):-Once the database schemas are compiled and the database is populated with data, users must have some means to manipulate the database. Typical manipulations include retrieval, insertion, deletion, and modification of the data. The DBMS provides a set of operations or a language called the data manipulation language (DML) for these purposes. Dept of Computer Science Page 15 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE There are two main types of DMLs. A high-level or nonprocedural DML can be used on its own to specify complex database operations concisely. High level DMLs, such as SQL, can specify and retrieve many records in a single DML statement; therefore, they are called set-at-a-time or setoriented DMLs. A query in a high-level DML often specifies which data to retrieve rather than how to retrieve it; therefore, such languages are also called declarative. Low-level or procedural DML must be embedded in a general-purpose programming language. This type of DML typically retrieves individual records or objects from the database and processes each separately. Therefore, it needs to use programming language constructs, such as looping, to retrieve and process each record from a set of records. Low-level DMLs are also called record-at-a-time DMLs because of this property. Whenever DML commands, whether high level or low level, are embedded in a general-purpose programming language, that language is called the host language and the DML is called the data sublanguage. On the other hand, a high-level DML used in a standalone interactive manner is called a query language. Example(various commands) for DBMS Languages 1.DML DML is abbreviation of Data Manipulation Language. It is used to retrieve, store, modify, delete, insert and update data in database. SELECT – Retrieves data from a table. INSERT – Inserts data into a table UPDATE – Updates existing data into a table DELETE – Deletes all records from a table 2. DDL DDL is abbreviation of Data Definition Language. It is used to create and modify the structure of database objects in database. CREATE – Creates objects in the database ALTER – Alters objects of the database DROP –Deletes objects of the database TRUNCATE – Deletes all records from a table and resets table identity to initial value. 3. DCL DCL is abbreviation of Data Control Language. It is used to create roles, permissions, and referential integrity as well it is used to control access to database by securing it. Dept of Computer Science Page 16 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE GRANT – Gives user’s access privileges to database REVOKE – Withdraws user’s access privileges to database given with the GRANT command 4. TCL TCL is abbreviation of Transactional Control Language. It is used to manage different transactions occurring within a database. COMMIT – Saves work done in transactions ROLLBACK – Restores database to original state since the last COMMIT command in transactions SAVE TRANSACTION – Sets a savepoint within a transaction DBMS Interfaces User-friendly interfaces provided by a DBMS may include the following: 1. Menu-Based Interfaces for Web Clients or Browsing. These interfaces present the user with lists of options (called menus) that lead the user through the formulation of a request. Menus do away with the need to memorize the specific commands and syntax of a query language; rather, the query is composed step-by step by picking options from a menu that is displayed by the system. Pull-down menus are a very popular technique in Web-based user interfaces. 2.Forms-Based Interfaces: A forms-based interface displays a form to each user. Users can fill out all of the form entries to insert new data, or they can fill out only certain entries, in which case the DBMS will retrieve matching data for the remaining entries. Forms are usually designed and programmed for naive users as interfaces to canned transactions. 3. Graphical User Interfaces :A GUI typically displays a schema to the user in diagrammatic form. The user then can specify a query by manipulating the diagram. In many cases, GUIs utilize both menus and forms .Most GUIs use a pointing device, such as a mouse, to select certain parts of the displayed schema diagram. 4. Natural Language Interfaces: These interfaces accept requests written in English or some other language and attempt to understand them. A natural language interface usually has its own schema, which is similar to the database conceptual schema, as well as a dictionary of important words. The natural language interface refers to the words in its schema, as well as to the set of standard words in its dictionary, to interpret the request. 5. Speech Input and Output: Limited use of speech as an input query and speech as an answer to a question or result of a request is becoming commonplace. Dept of Computer Science Page 17 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Applications with limited vocabularies such as inquiries for telephone directory, flight arrival/departure, and credit card account information are allowing speech for input and output to enable customers to access this information. 6. Interfaces for Parametric Users: Parametric users, such as bank tellers, often have a small set of operations that they must perform repeatedly. For example, a teller is able to use single function keys to invoke routine and repetitive transactions such as account deposits or withdrawals, or balance inquiries. 7. Interfaces for the DBA: Most database systems contain privileged commands that can be used only by the DBA staff. These include commands for creating accounts, setting system parameters, granting account authorization, changing a schema, and reorganizing the storage structures of a database. The Database System Environment There are 3 different modules 1. DBMS Component Modules Dept of Computer Science Page 18 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Figure illustrates, in a simplified form, the typical DBMS components. The figure is divided into two parts. The top part of the figure refers to the various users of the database environment and their interfaces. The lower part shows the internals of the DBMS responsible for storage of data and processing of transactions. A higher-level stored data manager module of the DBMS controls access to DBMS information that is stored on disk, whether it is part of the database or the catalog. In top part of the fig, The DDL compiler processes schema definitions, specified in the DDL, and stores descriptions of the schemas (meta-data) in the DBMS catalog. Casual users and persons with occasional need for information from the database interact using some form of interface, which we call the interactive query interface in Figure. Dept of Computer Science Page 19 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Queries are parsed and validated for correctness of the query syntax, the names of files and data elements, and by a query compiler that compiles them into an internal form. The query optimizer is concerned with the rearrangement and possible reordering of operations, elimination of redundancies, and use of correct algorithms and indexes during execution. The precompiler extracts DML commands from an application program written in a host programming language. These commands are sent to the DML compiler for compilation into object code for database access. In the lower part of Figure, The runtime database processor executes (1) the privileged commands, (2) the executable query plans, and (3) the canned transactions with runtime parameters. The runtime database processor handles other aspects of data transfer, such as management of buffers in the main memory. It works with the system catalog and may update it with statistics. It also works with the stored data manager, which in turn uses basic operating system services for carrying out low-level input/output (read/write) operations between the disk and main memory. DBMSs have their own buffer management module while others depend on the OS for buffer management. Concurrency control and backup and recovery systems are integrated into the working of the runtime database processor for purposes of transaction management. 2. Database System Utilities In addition to possessing the software modules just described, most DBMSs have database utilities that help the DBA manage the database system. Common utilities have the following types of functions: ■ Loading- A loading utility is used to load existing data files—such as text files or sequential files—into the database. ■ Backup- A backup utility creates a backup copy of the database, usually by dumping the entire database onto tape or other mass storage medium. The backup copy can be used to restore the database in case of catastrophic disk failure. ■ Database storage reorganization- This utility can be used to reorganize a set of database files into different file organizations, and create new access paths to improve performance. ■ Performance monitoring-Such a utility monitors database usage and provides statistics to the DBA. The DBA uses the statistics in making decisions such as whether or not to reorganize files or whether to add or drop indexes to improve performance. Dept of Computer Science Page 20 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Other utilities may be available for sorting files, handling data compression, monitoring access by users, interfacing with the network, and performing other functions. 3. Tools, Application Environments, and Communications Facilities CASE (Computer Aided Software Engineering) tools are used in the design phase of database systems. Data dictionary (or data repository) system.The data dictionary stores other information, such as design decisions, usage standards, application program descriptions, and user information. Such a system is also called an information repository. This information can be accessed directly by users or the DBA when needed. A data dictionary utility is similar to the DBMS catalog, but it includes a wider variety of information and is accessed mainly by users rather than by the DBMS software. Application development environments, such as PowerBuilder (Sybase) or JBuilder (Borland), have been quite popular. These systems provide an environment for developing database applications and include facilities that help in many facets of database systems, including database design, GUI development, querying and updating, and application program development. The DBMS also needs to interface with communications software, whose function is to allow users at locations remote from the database system site to access the database through computer terminals, workstations, or personal computers. These are connected to the database site through data communications hardware such as Internet routers, phone lines, long-haul networks, local networks, or satellite communication devices. Data Modeling Using the Entity-Relationship (ER) Model Entity Types, Entity Sets, Attributes, and Keys Dept of Computer Science Page 21 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE The ER model describes data as entities, relationships, and attributes. The basic object that the ER model represents is an entity, which is a thing in the real world with an independent existence. An entity may be an object with a physical existence (for example, a particular person, car, house, or employee) or it may be an object with a conceptual existence (for instance, a company, a job, or a university course). Each entity has attributes—the particular properties that describe it. For example, an EMPLOYEE entity may be described by the employee’s name, age, address, salary, and job. A particular entity will have a value for each of its attributes. The attribute values that describe each entity become a major part of the data stored in the database. Entities and Their Attributes: Figure : Two entities, EMPLOYEE e1, and COMPANY c1, and their attributes. Several types of attributes occur in the ER model: simple versus composite, single valued versus multi valued, and stored versus derived. 1. Composite versus Simple (Atomic) Attributes Composite attributes can be divided into smaller subparts, which represent more basic attributes with independent meanings. For example, the Address attribute of the EMPLOYEE entity can be subdivided into Street address, City, State, and Zip,3 with the values ‘2311 Kirby’, ‘Houston’, ‘Texas’, and ‘77001. Composite attributes can form a hierarchy; for example, Street address can be further subdivided into three simple component attributes: Number, Street, and Apartment number. Figure :A hierarchy of composite attributes Dept of Computer Science Page 22 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Attributes that are not divisible are called simple or atomic attributes. The value of a composite attribute is the concatenation of the values of its component simple attributes. If the composite attribute is referenced only as a whole, there is no need to subdivide it into component attributes. For example, if there is no need to refer to the individual components of an address (Zip Code, street, and so on), then the whole address can be designated as a simple attribute. 2. Single-Valued versus Multivalued Attributes Most attributes have a single value for a particular entity; such attributes are called single-valued. For example, Age is a single-valued attribute of a person. Some cases an attribute can have a set of values for the same entity—for instance, a Colors attribute for a car, or a College degrees attribute for a person. Cars with one color have a single value, whereas two-tone cars have two color values. Similarly, one person may not have a college degree, another person may have one, and a third person may have two or more degrees; therefore, different people can have different numbers of values for the College degrees attribute. Such attributes are called multivalued. A multivalued attribute may have lower and upper bounds to constrain the number of values allowed for each individual entity. For example, the Colors attribute of a car may be restricted to have between one and three values, if we assume that a car can have three colors at most. 3. Stored versus Derived Attributes In some cases, two (or more) attribute values are related—for example, the Age and Birth_date attributes of a person. For a particular person entity, the value of Age can be determined from the current (today’s) date and the value of that person’s Birth_date. The Age attribute is hence called a derived attribute and is said to be derivable from the Birth_date attribute,which is called a stored attribute. Some attribute values can be derived from related entities; for example, an attribute Number_of_employees of a DEPARTMENT entity can be derived by counting the number of employees related to (working for) that department. 4. NULL Value Dept of Computer Science Page 23 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE In some cases, a particular entity may not have an applicable value for an attribute. For example, the Apartment_number attribute of an address applies only to addresses that are in apartment buildings and not to other types of residences, such as single-family homes. Similarly, a College_degrees attribute applies only to people with college degrees. For such situations, a special value called NULL is created. An address of a single-family home would have NULL for its Apartment_number attribute, and a person with no college degree would have NULL for College_degrees. NULL can also be used if we do not know the value of an attribute for a particular entity. 5. Complex Attributes Notice that, in general, composite and multivalued attributes can be nested arbitrarily.We can represent arbitrary nesting by grouping components of a composite attribute between parentheses () and separating the components with commas, and by displaying multivalued attributes between braces { }. Such attributes are called complex attributes. For example, if a person can have more than one residence and each residence can have a single address and multiple phones, an attribute Address_phone for a person can be specified as shown in Figure Both Phone and Address are themselves composite attributes. Figure A complex attribute:Address_phone. {Address_phone( {Phone(Area_code,Phone_number)},Address(Street_address (Number,Street,Apartment_number),City,State,Zip) )} Entity Types, Entity Sets, Keys, and Value Sets Entity Types and Entity Set: A database usually contains groups of entities that are similar. For example, a company employing hundreds of employees may want to store similar information concerning each of the employees. These employee entities share the same attributes, but each entity has its own value(s) for each attribute. An entity type defines a collection (or set) of entities that have the same attributes. Each entity type in the database is described by its name and attributes. Figure shows two entity types: EMPLOYEE and COMPANY, and a list of some of the attributes for each. A few individual entities of each type are also illustrated, along with the values of their attributes. Dept of Computer Science Page 24 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE The collection of all entities of a particular entity type in the database at any point in time is called an entity set; the entity set is usually referred to using the same name as the entity type. For example, EMPLOYEE refers to both a type of entity as well as the current set of all employee entities in the database. An entity type is represented in ER diagrams as a rectangular box enclosing the entity type name. Attribute names are enclosed in ovals and are attached to their entity type by straight lines. Composite attributes are attached to their component attributes by straight lines. Multi valued attributes are displayed in double ovals. Figure shows a CAR entity type in this notation. An entity type describes the schema or intension for a set of entities that share the same structure. The collection of entities of a particular entity type is grouped into an entity set, which is also called the extension of the entity type. Key Attributes of an Entity Type An important constraint on the entities of an entity type is the key or uniqueness constraint on attributes. An entity type usually has one or more attributes whose values Dept of Computer Science Page 25 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE are distinct for each individual entity in the entity set. Such an attribute is called a key attribute, and its values can be used to identify each entity uniquely. For example, the Name attribute is a key of the COMPANY entity type because no two companies are allowed to have the same name. For the PERSON entity type, a typical key attribute is Ssn (Social Security number). Sometimes several attributes together form a key, meaning that the combination of the attribute values must be distinct for each entity. In ER diagrammatic notation, each key attribute has its name underlined inside the oval .Specifying that an attribute is a key of an entity type means that the preceding uniqueness property must hold for every entity set of the entity type. Hence, it is a constraint that prohibits any two entities from having the same value for the key attribute at the same time. Some entity types have more than one key attribute. For example, each of the Vehicle_id and Registration attributes of the entity type CAR is a key in its own right. The Registration attribute is an example of a composite key formed from two simple component attributes, State and Number, neither of which is a key on its own. An entity type may also have no key, in which case it is called a weak entity type. In our diagrammatic notation, if two attributes are underlined separately, then each is a key on its own. Unlike the relational model, there is no concept of primary key in the ER model that we present here; the primary key will be chosen during mapping to a relational schema Value Sets (Domains) of Attributes Each simple attribute of an entity type is associated with a value set (or domain of values), which specifies the set of values that may be assigned to that attribute for each individual entity. If the range of ages allowed for employees is between 16 and 70, we can specify the value set of the Age attribute of EMPLOYEE to be the set of integer numbers between 16 and 70. Similarly, we can specify the value set for the Name attribute to be the set of strings of alphabetic characters separated by blank characters, and so on. Value sets are not displayed in ER diagrams, and are typically specified using the basic data types available in most programming languages, such as integer, string, Boolean, float, and enumerated type, subrange, and so on. Additional data types to represent common database types such as date, time, and other concepts are also employed. Mathematically, an attribute A of entity set E whose value set is V can be defined as a function from E to the power set6 P(V) of V: A : E → P(V) Dept of Computer Science Page 26 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE We refer to the value of attribute A for entity e as A(e). The previous definition covers both single-valued and multivalued attributes, as well as NULLs. A NULL value is represented by the empty set. Initial Conceptual Design of the COMPANY Database We can now define the entity types for the COMPANY database, based on the requirements First define several entity types and their attributes then we introduce the concept of a relationship. we can identify four entity types—one corresponding to each of the four items in the specification 1. An entity type DEPARTMENT with attributes Name, Number, Locations, Manager, and Manager_start_date.Locations is the only multivalued attribute. We can specify that both Name and Number are (separate) key attributes because each was specified to be unique. 2. An entity type PROJECT with attributes Name, Number, Location, and Controlling_department. Both Name and Number are (separate) key attributes. 3. An entity type EMPLOYEE with attributes Name, Ssn, Sex, Address, Salary, Birth_date, Department, and Supervisor. Both Name and Address may be composite attributes; however, this was not specified in the requirements. We must go back to the users to see if any of them will refer to the individual components of Name—First_name, Middle_initial, Last_name—or of Address. 4. An entity type DEPENDENT with attributes Employee, Dependent_name, Sex, Birth_date, and Relationship (to the employee). Dept of Computer Science Page 27 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Relationship Types, Relationship Sets, Roles, and Structural Constraints There are several implicit relationships among the various entity types. In fact, whenever an attribute of one entity type refers to another entity type, some relationship exists. For example, the attribute Manager of DEPARTMENT refers to an employee who manages the department;In the ER model, these references should not be represented as attributes but as relationships. Relationship Types, Sets, and Instances A relationship type R among n entity types E1, E2, ..., En defines a set of associations—or a relationship set—among entities from these entity types. As for the case of entity types and entity sets, a relationship type and its corresponding relationship set are customarily referred to by the same name, R. For example, consider a relationship type WORKS_FOR between the two entity types EMPLOYEE and DEPARTMENT, which associates each employee with the department for which the employee works in the corresponding entity set. Each relationship instance in the relationship set WORKS_FOR associates one EMPLOYEE entity and one DEPARTMENT entity. Figure illustrates this example, where each relationship Relationship Types, Relationship Sets, Roles, and Structural Constraints Dept of Computer Science Page 28 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE In ER diagrams, relationship types are displayed as diamond-shaped boxes, which are connected by straight lines to the rectangular boxes representing the participating entity types. The relationship name is displayed in the diamond-shaped box. Relationship Degree, Role Names, and Recursive Relationships Degree of a Relationship Type: The degree of a relationship type is the number of participating entity types. Hence, the WORKS_FOR relationship is of degree two. A relationship type of degree two is called binary, and one of degree three is called ternary. Role Names and Recursive Relationships Each entity type that participates in a relationship type plays a particular role in the relationship. The role name signifies the role that a participating entity from the entity type plays in each relationship instance, and helps to explain what the relationship means. For example, in the WORKS_FOR relationship type, EMPLOYEE plays the role of employee or worker and DEPARTMENT plays the role of department or employer. Role names are not technically necessary in relationship types where all the participating entity types are distinct, since each participating entity type name can be used as the role name. However, in some cases the same entity type participates more than once in a relationship type in different roles. In such cases the role name becomes essential for distinguishing the meaning of the role that each participating entity plays. Such relationship types are called recursive relationships. Figure shows an example. The SUPERVISION relationship type relates an employee to a supervisor, where both employee and supervisor entities are members of the same Dept of Computer Science Page 29 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE EMPLOYEE entity set. Hence, the EMPLOYEE entity type participates twice in SUPERVISION: once in the role of supervisor (or boss), and once in the role of supervisee (or subordinate). Constraints on Binary Relationship Types Relationship types usually have certain constraints that limit the possible combinations of entities that may participate in the corresponding relationship set. These constraints are determined from the miniworld situation that the relationships represent. We can distinguish two main types of binary relationship constraints: cardinality ratio and participation. 1. Cardinality Ratios for Binary Relationships. The cardinality ratio for a binary relationship specifies the maximum number of relationship instances that an entity can participate in. For example, in the WORKS_FOR binary relationship type, DEPARTMENT: EMPLOYEE is of cardinality ratio 1: N, meaning that each department can be related to (that is, employs) any number of employees, but an employee can be related to (work for) only one department. This means that for this particular relationship WORKS_FOR, a particular department entity can be related to any number of employees (N indicates there is no maximum number). On the other hand, an employee can be related to a maximum of one department. The possible cardinality ratios for binary relationship types are 1:1, 1: N, N: 1, and M: N. 1:1 binary relationship: An example of a 1:1 binary relationship is MANAGES which relates a department entity to the employee who manages that department. This represents the miniworld constraints that—at any point in time—an employee can manage one department only and a department can have one manager only. M: N Cardinality Ratio: The relationship type WORKS_ON is of cardinality ratio M:N, because the miniworld rule is that an employee can work on several projects and a project can have several employees. Dept of Computer Science Page 30 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Figure: An M:N relationship,WORKS_ON. N: 1 Cardinality Ratio: Employees can works for only one Department but one department can have many employees. Employee N Wor ks 1 Department 1: N Cardinality Ratio: One project managed by many Departments but one department can have many Projects. Cardinality ratios for binary relationships are represented on ER diagrams by displaying 1, M, and N on the diamonds as shown in Figure. 2. Participation Constraints and Existence Dependencies. The participation constraint specifies whether the existence of an entity depends on its being related to another entity via the relationship type. This constraint specifies the minimum number of relationship instances that each entity can participate in, and is sometimes called the minimum cardinality constraint. There are two types of participation constraints— total and partial—that we illustrate by example. If a company policy states that every employee must work for a department, then an employee entity can exist only if it participates in at least one WORKS_FOR relationship instance (Figure below). Thus, Dept of Computer Science Page 31 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE the participation of EMPLOYEE in WORKS_FOR is called total participation, meaning that every entity in the total set of employee entities must be related to a department entity via WORKS_FOR. Total participation is also called existence dependency. Fig: Total Participation And in the Figure below, we do not expect every employee to manage a department, so the participation of EMPLOYEE in the MANAGES relationship type is partial, meaning that some or part of the set of employee entities are related to some department entity via MANAGES, but not necessarily all. Fig: Partial Participation We will refer to the cardinality ratio and participation constraints, taken together, as the structural constraints of a relationship type. Dept of Computer Science Page 32 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE In ER diagrams, total participation (or existence dependency) is displayed as a double line connecting the participating entity type to the relationship, whereas partial participation is represented by a single line (see Figure). Notice that in this notation, we can either specify no minimum (partial participation) or a minimum of one (total participation). Attributes of Relationship Types Relationship types can also have attributes, similar to those of entity types. For example, to record the number of hours per week that an employee works on a particular project, we can include an attribute Hours for the WORKS_ON relationship type. Another example is to include the date on which a manager started managing a department via an attribute Start_date for the MANAGES relationship type. Weak Entity Types Entity types that do not have key attributes of their own are called weak entity types. In contrast, regular entity types that do have a key attribute—which include all the examples discussed so far—are called strong entity types. Dept of Computer Science Page 33 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Common Notations Used in ER Diagram Dept of Computer Science Page 34 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Refining the ER Design for the COMPANY Database In our example, we specify the following relationship types: ■ MANAGES a 1:1 relationship type between EMPLOYEE and DEPARTMENT. EMPLOYEE participation is partial. DEPARTMENT participation is not clear from the requirements. We question the users, who say that a department must have a manager at all times, which implies total participation.13 The attribute Start_date is assigned to this relationship type. ■ WORKS_FOR, a 1: N relationship type between DEPARTMENT and EMPLOYEE. Both participations are total. ■ CONTROLS, a 1: N relationship type between DEPARTMENT and PROJECT. Dept of Computer Science Page 35 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE The participation of PROJECT is total, whereas that of DEPARTMENT is determined to be partial, after consultation with the users indicates that some departments may control no projects. ■ SUPERVISION, a 1: N relationship type between EMPLOYEE (in the supervisor role) and EMPLOYEE (in the supervisee role). Both participations are determined to be partial, after the users indicate that not every employee is a supervisor and not every employee has a supervisor. ■ WORKS_ON, determined to be an M: N relationship type with attribute Hours, after the users indicate that a project can have several employees working on it. Both participations are determined to be total. ■ DEPENDENTS_OF, a 1: N relationship type between EMPLOYEE and DEPENDENT, which is also the identifying relationship for the weak entity type DEPENDENT. The participation of EMPLOYEE is partial, whereas that of DEPENDENT is total. ER Diagrams- Naming Conventions When designing a database schema, the choice of names for entity types, attributes, relationship types, and (particularly) roles is not always straightforward. One should choose names that convey, as much as possible, the meanings attached to the different constructs in the schema. We choose to use singular names for entity types, rather than plural ones, because the entity type name applies to each individual entity belonging to that entity type. In our ER diagrams, we will use the convention that entity type and relationship type names are uppercase letters, attribute names have their initial letter capitalized, and role names are lowercase letters As a general practice, given a narrative description of the database requirements, the nouns appearing in the narrative tend to give rise to entity type names, and the verbs tend to indicate names of relationship types. Attribute names generally arise from additional nouns that describe the nouns corresponding to entity types. Another naming consideration involves choosing binary relationship names to make the ER diagram of the schema readable from left to right and from top to bottom. Dept of Computer Science Page 36 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Examples –ER DIAGRAMS ER DIAGRAM FOR BANK DATABASE Dept of Computer Science Page 37 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE UNIT 2 Transaction A transaction is an event which occurs on the database. Generally a transaction reads a value from the database or writes a value to the database. If you have any concept of Operating Systems, then we can say that a transaction is analogous to processes. Although a transaction can both read and write on the database, there are some fundamental differences between these two classes of operations. A read operation does not change the image of the database in any way. But a write operation, whether performed with the intention of inserting, updating or deleting data from the database, changes the image of the database. That is, we may say that these transactions bring the database from an image which existed before the transaction occurred (called Dept of Computer Science Page 38 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE the Before Image or BFIM) to an image which exists after the transaction occurred (called the After Image or AFIM). The Four Properties of Transactions Every transaction, for whatever purpose it is being used, has the following four properties. Taking the initial letters of these four properties we collectively call them the ACID Properties. 1. Atomicity: This means that either all of the instructions within the transaction will be reflected in the database, or none of them will be reflected. Say for example, we have two accounts A and B, each containing Rs 1000/-. We now start a transaction to deposit Rs 100/- from account A to Account B. Read A; A = A – 100; Write A; Read B; B = B + 100; Write B; The transaction has 6 instructions to extract the amount from A and submit it to B. The AFIM will show Rs 900/- in A and Rs 1100/- in B. 2. Consistency: If we execute a particular transaction in isolation or together with other transaction, (i.e. presumably in a multi-programming environment), the transaction will yield the same expected result. To give better performance, every database management system supports the execution of multiple transactions at the same time, using CPU Time Sharing. Concurrently executing transactions may have to deal with the problem of sharable resources, i.e. resources that multiple transactions are trying to read/write at the same time. For example, we may have a table or a record on which two transaction are trying to read or write at the same time. Careful mechanisms are created in order to prevent mismanagement of these sharable resources, so that there should not be any change in the way a transaction performs. A transaction which deposits Rs 100/- to account A must deposit the same amount whether it is acting alone or in conjunction with another transaction that may be trying to deposit or withdraw some amount at the same time. 3. Isolation: In case multiple transactions are executing concurrently and trying to access a sharable resource at the same time, the system should create an ordering in their execution so that they should not create any anomaly in the value stored at the sharable resource. There are several ways to achieve this and the most popular one is using some kind of locking mechanism. Again, if you have the concept of Operating Systems, then you should remember the semaphores, how it is used by a process to make a resource busy Dept of Computer Science Page 39 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE before starting to use it, and how it is used to release the resource after the usage is over. Other processes intending to access that same resource must wait during this time. Locking is almost similar. It states that a transaction must first lock the data item that it wishes to access, and release the lock when the accessing is no longer required. Once a transaction locks the data item, other transactions wishing to access the same data item must wait until the lock is released. 4. Durability: It states that once a transaction has been complete the changes it has made should be permanent. As we have seen in the explanation of the Atomicity property, the transaction, if completes successfully, is committed. Once the COMMIT is done, the changes which the transaction has made to the database are immediately written into permanent storage. So, after the transaction has been committed successfully, there is no question of any loss of information even if the power fails. Committing a transaction guarantees that the AFIM has been reached. There are several ways Atomicity and Durability can be implemented. One of them is called Shadow Copy. In this scheme a database pointer is used to point to the BFIM of the database. During the transaction, all the temporary changes are recorded into a Shadow Copy, which is an exact copy of the original database plus the changes made by the transaction, which is the AFIM. Now, if the transaction is required to COMMIT, then the database pointer is updated to point to the AFIM copy, and the BFIM copy is discarded. On the other hand, if the transaction is not committed, then the database pointer is not updated. It keeps pointing to the BFIM, and the AFIM is discarded. This is a simple scheme, but takes a lot of memory space and time to implement. Atomicity and Durability is essentially the same thing, just as Consistency and Isolation is essentially the same thing. Transaction States Dept of Computer Science Page 40 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE There are the following six states in which a transaction may exist 1.Active: The initial state when the transaction has just started execution . 2. Partially Committed: At any given point of time if the transaction is executing properly, then it is going towards it COMMIT POINT. The values generated during the execution are all stored in volatile storage. At this point, some recovery protocols need to ensure that a system failure will not result in an inability to record the changes of the transaction permanently (usually by recording changes in the system log, discussed in the next section). Once this check is successful, the transaction is said to have reached its commit point and enters the committed state. 3. Failed: If the transaction fails for some reason. The temporary values are no longer required, and the transaction is set to ROLLBACK. It means that any change made to the database by this transaction up to the point of the failure must be undone. If the failed transaction has withdrawn Rs. 100/- from account A, then the ROLLBACK operation should add Rs 100/- to account A. 4. Aborted: When the ROLLBACK operation is over, the database reaches the BFIM. The transaction is now said to have been aborted. 5. Committed: If no failure occurs then the transaction reaches the COMMIT POINT. All the temporary values are written to the stable storage and the transaction is said to have been committed. 6. Terminated: Either committed or aborted, the transaction finally reaches this state. The whole process can be described using the following diagram: Concurrent Execution A schedule is a collection of many transactions which is implemented as a unit. Depending upon how these transactions are arranged in within a schedule, a schedule can be of two types: Serial: The transactions are executed one after another, in a non-preemptive manner. Concurrent: The transactions are executed in a preemptive, time shared method. In Serial schedule, there is no question of sharing a single data item among many transactions, because not more than a single transaction is executing at any point of time. However, a serial schedule is inefficient in the sense that the transactions suffer for having a longer waiting time and response time, as well as low amount of resource utilization. In concurrent schedule, CPU time is shared among two or more transactions in order to run them concurrently. However, this creates the possibility that more than one transaction may need to access a single data item for read/write purpose and the database could contain inconsistent value if such accesses are not handled properly. Let us explain with the help of an example. Dept of Computer Science Page 41 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Let us consider there are two transactions T1 and T2, whose instruction sets are given as following. T1 is the same as we have seen earlier, while T2 is a new transaction. T1 Read A; A = A – 100; Write A; Read B; B = B + 100; Write B; T2 Read A; Temp = A * 0.1; Read C; C = C + Temp; Write C; T2 is a new transaction which deposits to account C 10% of the amount in account A. If we prepare a serial schedule, then either T1 will completely finish before T2 can begin, or T2 will completely finish before T1 can begin. However, if we want to create a concurrent schedule, then some Context Switching need to be made, so that some portion of T1 will be executed, then some portion of T2 will be executed and so on. For example say we have prepared the following concurrent schedule. T1 T2 Read A; A = A - 100; Write A; Read A; Temp = A * 0.1; Read C; C = C + Temp; Write C; Read B; B = B + 100; Write B; No problem here. We have made some Context Switching in this Schedule, the first one after executing the third instruction of T1, and after executing the last statement of T2. T1 first deducts Rs 100/- from A and writes the new value of Rs 900/- into A. T2 reads Dept of Computer Science Page 42 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE the value of A, calculates the value of Temp to be Rs 90/- and adds the value to C. The remaining part of T1 is executed and Rs 100/- is added to B. It is clear that a proper Context Switching is very important in order to maintain the Consistency and Isolation properties of the transactions. But let us take another example where a wrong Context Switching can bring about disaster. Consider the following example involving the same T1 and T2 T1 T2 Read A; A = A - 100; Read A; Temp = A * 0.1; Read C; C = C + Temp; Write C; Write A; Read B; B = B + 100; Write B; This schedule is wrong, because we have made the switching at the second instruction of T1. The result is very confusing. If we consider accounts A and B both containing Rs 1000/- each, then the result of this schedule should have left Rs 900/- in A, Rs 1100/in B and add Rs 90 in C (as C should be increased by 10% of the amount in A). But in this wrong schedule, the Context Switching is being performed before the new value of Rs 900/- has been updated in A. T2 reads the old value of A, which is still Rs 1000/-, and deposits Rs 100/- in C. C makes an unjust gain of Rs 10/- out of nowhere. In the above example, we detected the error simple by examining the schedule and applying common sense. But there must be some well formed rules regarding how to arrange instructions of the transactions to create error free concurrent schedules. This brings us to our next topic, the concept of Serializability. Serializability When several concurrent transactions are trying to access the same data item, the instructions within these concurrent transactions must be ordered in some way so as there are no problem in accessing and releasing the shared data item. There are two aspects of serializability which are described here: Dept of Computer Science Page 43 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE 1. Conflict Serializability Two instructions of two different transactions may want to access the same data item in order to perform a read/write operation. Conflict Serializability deals with detecting whether the instructions are conflicting in any way, and specifying the order in which these two instructions will be executed in case there is any conflict. A conflict arises if at least one (or both) of the instructions is a write operation. The following rules are important in Conflict Serializability: 1. If two instructions of the two concurrent transactions are both for read operation, then they are not in conflict, and can be allowed to take place in any order. 2. If one of the instructions wants to perform a read operation and the other instruction wants to perform a write operation, then they are in conflict, hence their ordering is important. If the read instruction is performed first, then it reads the old value of the data item and after the reading is over, the new value of the data item is written. It the write instruction is performed first, then updates the data item with the new value and the read instruction reads the newly updated value. 3. If both the transactions are for write operation, then they are in conflict but can be allowed to take place in any order, because the transaction do not read the value updated by each other. However, the value that persists in the data item after the schedule is over is the one written by the instruction that performed the last write. It may happen that we may want to execute the same set of transaction in a different schedule on another day. Keeping in mind these rules, we may sometimes alter parts of one schedule (S1) to create another schedule (S2) by swapping only the non-conflicting parts of the first schedule. The conflicting parts cannot be swapped in this way because the ordering of the conflicting instructions is important and cannot be changed in any other schedule that is derived from the first. If these two schedules are made of the same set of transactions, then both S1 and S2 would yield the same result if the conflict resolution rules are maintained while creating the new schedule. In that case the schedule S1 and S2 would be called Conflict Equivalent. 2. View Serializability This is another type of serializability that can be derived by creating another schedule out of an existing schedule, involving the same set of transactions. These two schedules would be called View Serializable if the following rules are followed while creating the second schedule out of the first. Let us consider that the transactions T1 and T2 are being serialized to create two different schedules S1 and S2 which we want to be View Equivalent and both T1 and T2 wants to access the same data item. 1. If in S1, T1 reads the initial value of the data item, then in S2 also, T1 should read the initial value of that same data item. Dept of Computer Science Page 44 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE 2. If in S1, T1 writes a value in the data item which is read by T2, then in S2 also, T1 should write the value in the data item before T2 reads it. 3. If in S1, T1 performs the final write operation on that data item, then in S2 also, T1 should perform the final write operation on that data item. Except in these three cases, any alteration can be possible while creating S2 by modifying S1. Relational data model Relational data model is the primary data model, which is used widely around the world for data storage and processing. This model is simple and have all the properties and capabilities required to process data with storage efficiency. Relational Model Concepts Tables: In relation data model, relations are saved in the format of Tables. This format stores the relation among entities. A table has rows and columns, where rows represent records and columns represents the attributes. Tuple: A single row of a table, which contains a single record for that relation is called a tuple. Relation instance: A finite set of tuples in the relational database system represents relation instance. Relation instances do not have duplicate tuples. Relation schema: This describes the relation name (table name), attributes and their names. Relation key: Each row has one or more attributes which can identify the row in the relation (table) uniquely, is called the relation key. Attribute domain: Every attribute has some pre-defined value scope, known as attribute domain. Relational Model Constraints Every relation has some conditions that must hold for it to be a valid relation. These conditions are called Relational Integrity Constraints. There are three main integrity constraints. Key Constraints Domain constraints Referential integrity constraints Dept of Computer Science Page 45 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE KEY CONSTRAINTS: There must be at least one minimal subset of attributes in the relation, which can identify a tuple uniquely. This minimal subset of attributes is called key for that relation. If there is more than one such minimal subset, these are called candidate keys Key constraints forces that: in a relation with a key attribute, no two tuples can have identical value for key attributes. key attribute cannot have NULL values. Key constrains are also referred to as Entity Constraints. Domain Constraints Attributes have specific values in real-world scenario. For example, age can only be positive integer. The same constraints has been tried to employ on the attributes of a relation. Every attribute is bound to have a specific range of values. For example, age cannot be less than zero and telephone number cannot be a outside 0-9. Referential Integrity Constraints This integrity constraints works on the concept of Foreign Key. A key attribute of a relation can be referred in other relation, where it is called foreign key. Referential integrity constraint states that if a relation refers to a key attribute of a different or same relation, that key element must exist. Relational database schema o A relational database schema helps you to organize and understand the structure of a database. This is particularly useful when designing a new database, modifying an existing database to support more functionality, or building integration between databases. Creating the Schema o There are two steps to creating a relational database schema: creating the logical schema and creating the physical schema. The logical schema depicts the structure of the database, showing the tables, columns and relationships with other tables in the database and can be created with modeling tools or spreadsheet and drawing software. The physical schema is created by actually generating the tables, columns and relationships in the relational database management software (RDBMS). Most modeling tools can automate the creation of the physical schema from the logical schema, but it can also be done by manually. Dept of Computer Science Page 46 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Modifying the Schema o The structure of a relational database will inevitably change over time as data needs change. It's just as important to document changes to your relational database schema as it was to document the original design, otherwise it becomes increasingly difficult to update or integrate the database. Be sure to save copies of previous configurations, so that changes can be removed if problems occur. The Relational Algebra and Relational Calculus In this chapter we discuss the two formal languages for the relational model: the relational algebra and the relational calculus. Practical language for the relational model is the SQL standard. The basic set of operations for the relational model is the relational algebra. These operations enable a user to specify basic retrieval requests as relational algebra expressions. The result of retrieval is a new relation, which may have been formed from one or more relations. The algebra operations thus produce new relations, which can be further manipulated using operations of the same algebra. A sequence of relational algebra operations forms a relational algebra expression, whose result will also be a relation that represents the result of a database query (or retrieval request). Whereas the algebra defines a set of operations for the relational model, the relational calculus provides a higher-level declarative language for specifying relational queries. A relational calculus expression creates a new relation. In a relational calculus expression, there is no order of operations to specify how to retrieve the query result—only what information the result should contain. This is the main distinguishing feature between relational algebra and relational calculus. The relational calculus is important because it has a firm basis in mathematical logic and because the standard query language (SQL) for RDBMSs has some of its foundations in a variation of relational calculus known as the tuple relational calculus. Relational algebra operations The relational algebra is often considered to be an integral part of the relational data model. Its operations can be divided into two groups. One group includes set operations from mathematical set theory; these are applicable because each relation is defined to be a set of tuples in the formal relational model. Set operations include UNION, INTERSECTION, SET DIFFERENCE, and CARTESIAN PRODUCT (also known as CROSS PRODUCT). The other group consists of operations developed specifically for relational databases—these include SELECT, PROJECT, and JOIN, among others. First, we describe the SELECT and PROJECT operations because they are unary operations that operate on single relations. Then we discuss set operations Dept of Computer Science Page 47 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE we discuss JOIN and other complex binary operations, which operate on two tables by combining related tuples (records) based on join conditions. Unary Relational Operations: SELECT ,PROJECT and RENAME 1. SELECT The SELECT operation (denoted by σ (sigma)) is used to select a subset of the tuples from a relation based on a selection condition. The selection condition acts as a filter - Keeps only those tuples that satisfy the qualifying condition - Tuples satisfying the condition are selected whereas the other tuples are discarded (filtered out) In general, the SELECT operation is denoted by σ <selection condition>(R) where the symbol σ (sigma) is used to denote the SELECT operator and the selection condition is a Boolean expression (condition) specified on the attributes of relation R. The relation resulting from the SELECT operation has the same attributes as R. The Boolean expression specified in <selection condition> is made up of a number of clauses of the form <attribute name> <comparison op> <constant value> Or <attribute name> <comparison op> <attribute name> where <attribute name> is the name of an attribute of R, <comparison op> is normally one of the operators {=, <, ≤, >, ≥, ≠}, and <constant value> is a constant value from the attribute domain. Clauses can be connected by the standard Boolean operators and, or, and not to form a general selection condition. Properties of SELECT Operation SELECT s is commutative: s <condition1>(s < condition2> (R)) = s <condition2> (s < condition1> (R)) A cascade of SELECT operations may be replaced by a single selection with a conjunction of all the conditions: s<cond1>(s< cond2> (s<cond3>(R)) = s <cond1> AND < cond2> AND < cond3>(R) Select-Example: Name Age Weight Name Age Weight Dept of Computer Science Name Age Weight Page 48 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Harry 34 80 Harry 34 80 Sally 28 64 Helena 54 54 George 29 70 Peter 34 80 Helena 54 54 Peter 34 80 Helena 54 54 2. The PROJECT Operation PROJECT Operation is denoted by ∏ (pi) This operation keeps certain columns (attributes) from a relation and discards the other columns. PROJECT creates a vertical partitioning The list of specified columns (attributes) is kept in each tuple The other attributes in each tuple are discarded For ex: ∏subject, author (Books) -Selects and projects columns named as subject and author from relation Books. -We may want to apply several relational algebra operations one after the other -Either we can write the operations as a single relational algebra expression by nesting the operations, or -We can apply one operation at a time and create intermediate result relations. Ex: To retrieve the first name, last name, and salary of all employees who work in department number 5, we must apply a select and a project operation We can write a single relational algebra expression as follows: pFNAME, LNAME, SALARY(s DNO=5(EMPLOYEE)) 3. RENAME Operation The RENAME operator is denoted by (rho) In some cases, we may want to rename the attributes of a relation or the relation name or both Useful when a query requires multiple operations Necessary in some cases (see JOIN operation later) RENAME operation – which can rename either the relation name or the attribute names, or both Dept of Computer Science Page 49 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE The general RENAME operation can be expressed by any of the following forms: S(R) changes: the relation name only to S (B1, B2, …, Bn )(R) changes: the column (attribute) names only to B1, B1, …..Bn S (B1, B2, …, Bn )(R) changes both: the relation name to S, and the column (attribute) names to B1, B1, …..Bn A rename is a unary operation written as where the result is identical to R except that the b attribute in all tuples is renamed to an a attribute. This is simply used to rename the attribute of a relation or the relation itself. To rename the 'isFriend' attribute to 'isBusinessContact' in a relation, might be used. Relational Algebra Operations from Set Theory The UNION, INTERSECTION, MINUS (Set Difference) and the CARTESIAN PRODUCT (CROSS PRODUCT) Operations The next group of relational algebra operations is the standard mathematical operations on sets. Several set theoretic operations are used to merge the elements of two sets in various ways, including UNION, INTERSECTION, and SET DIFFERENCE (also called MINUS or EXCEPT). These are binary operations; that is, each is applied to two sets (of tuples). When these operations are adapted to relational databases, the two relations on which any of these three operations are applied must have the same type of tuples; this condition has been called union compatibility or type compatibility. Two relations R(A1, A2, ..., An) and S(B1, B2, ..., Bn) are said to be union compatible (or type compatible) if they have the same degree n and if dom(Ai) = dom(Bi) for 1<=i<=n. This means that the two relations have the same number of attributes and each corresponding pair of attributes has the same domain. We can define the three operations UNION, INTERSECTION, and SET DIFFERENCE on two union-compatible relations R and S as follows: ■ UNION: The result of this operation, denoted by R US, is a relation that includes all tuples that are either in R or in S or in both R and S. Duplicate tuples are eliminated. ■ INTERSECTION: The result of this operation, denoted by R ∩S, is a relation that includes all tuples that are in both R and S. Dept of Computer Science Page 50 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE ■ SET DIFFERENCE (or MINUS): The result of this operation, denoted by R – S is a relation that includes all tuples that are in R but not in S. We will adopt the convention that the resulting relation has the same attribute names as the first relation R. It is always possible to rename the attributes in the result using the rename operator. Figure illustrates the three operations. The relations STUDENT and INSTRUCTOR in Figure (a) are union compatible and their tuples represent the names of students and the names of instructors, respectively. The result of the UNION operation in Figure (b) shows the names of all students and instructors. Note that duplicate tuples appear only once in the result. The result of the INTERSECTION operation (Figure (c)) includes only those who are both students and instructors. Both UNION and INTERSECTION are commutative operations; that is, R US = SUR and R∩S = S ∩R Dept of Computer Science Page 51 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Both UNION and INTERSECTION can be treated as n-ary operations applicable to any number of relations because both are also associative operations; that is, R U(S U T) = (R US) UT and (R ∩S ) ∩T = R ∩(S ∩T ) The MINUS operation is not commutative; that is, in general, Note that INTERSECTION can be expressed in terms of union and set difference as follows: The CARTESIAN PRODUCT (CROSS PRODUCT) Operation It also known as CROSS PRODUCT or CROSS JOIN—which is denoted by X(Cross symbol) . This is also a binary set operation, but the relations on which it is applied do not have to be union compatible. In its binary form, this set operation produces a new element by combining every member (tuple) from one relation (set) with every member (tuple) from the other relation (set). In general, the result of R(A1, A2, ..., An) X S(B1, B2, ..., Bm) is a relation Q with degree n + m attributes Q(A1, A2, ..., An, B1, B2, ..., Bm), in that order. The resulting relation Q has one tuple for each combination of tuples—one from R and one from S. Hence, if R has nR tuples (denoted as |R| = nR), and S has nS tuples, then R X S will have nR * nS tuples. The n-ary CARTESIAN PRODUCT operation is an extension of the above concept, which produces new tuples by concatenating all possible combinations of tuples from n underlying relations. In general, the CARTESIAN PRODUCT operation applied by itself is generally meaningless. CARTESIAN PRODUCT example Binary Relational Operations: JOIN and DIVISION Dept of Computer Science Page 52 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE JOIN Operator JOIN is used to combine related tuples from two relations: In its simplest form the JOIN operator is just the cross product of the two relations. As the join becomes more complex, tuples are removed within the cross product to make the result of the join more meaningful. JOIN allows you to evaluate a join condition between the attributes of the relations on which the join is undertaken. The notation used is ⋈ R ⋈join condition S JOIN Example Equijoin : The most common use of JOIN involves join conditions with equality comparisons only. Such a JOIN, where the only comparison operator used is =, is called an EQUIJOIN. Both previous examples were EQUIJOINs. Notice that in the result of an EQUIJOIN we always have one or more pairs of attributes that have identical values in every tuple. Dept of Computer Science Page 53 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Natural Join:It is denoted by *.Invariably the JOIN involves an equality test, and thus is often described as an equi-join. Such joins result in two attributes in the resulting relation having exactly the same value. A `natural join' will remove the duplicate attribute(s). In most systems a natural join will require that the attributes have the same name to identify the attribute(s) to be used in the join. This may require a renaming mechanism. If you do use natural joins make sure that the relations do not have two attributes with the same name by accident. Theta Join: A join operator with general join condition (ex:condition with the operators,=,<=,>,>= ,# )is called a THETA join. OUTER JOINs Notice that much of the data is lost when applying a join to two relations. In some cases this lost data might hold useful information. An outer join retains the information that would have been lost from the tables, replacing missing data with nulls. Outer union can be used to calculate the union of two relations that are partially union compatible. Example: The table R AB 1 2 3 4 Example: The table S BC 4 5 6 7 The result of an outer union between R and S: A B C Dept of Computer Science Page 54 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE 1 3 2 null 4 5 null 6 7 There are three forms of the outer join, depending on which data is to be kept. LEFT OUTER JOIN( ) - keep data from the left-hand table RIGHT OUTER JOIN( ) - keep data from the right-hand table FULL OUTER JOIN( ) - keep data from both tables OUTER JOIN example 1 Figure : OUTER JOIN (left/right) OUTER JOIN example 2 Figure : OUTER JOIN (full) The DIVISION Operation Dept of Computer Science Page 55 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE The DIVISION operation, denoted by ÷, is useful for a special kind of query that sometimes occurs in database applications. Letr(R) and s(S) be relations r ÷ s: - the result consists of the restrictions of tuples in r to the attribute names unique to R, i.e. in the Header of r but not in the Header of s, for which it holds that all their combinations with tuples in s are present in r. Fig: Dept of Computer Science Page 56 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Dept of Computer Science Page 57 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Functional Dependencies FD's are constraints on well-formed relations and represent a formalism on the infrastructure of relation. Definition: A functional dependency (FD) on a relation schema R is a constraint X → Y, where X and Y are subsets of attributes of R. Definition: an FD is a relationship between an attribute "Y" and a determinant (1 or more other attributes) "X" such that for a given value of a determinant the value of the attribute is uniquely defined. X is a determinant X determines Y Y is functionally dependent on X X→Y X →Y is trivial if Y Í X Definition: An FD X → Y is satisfied in an instance r of R if for every pair of tuples, t and s: if t and s agree on all attributes in X then they must agree on all attributes in Y A key constraint is a special kind of functional dependency: all attributes of relation occur on the right-hand side of the FD: SSN → SSN, Name, Address Example Functional Dependencies Let R be NewStudent(stuId, lastName, major, credits, status, socSecNo) FDs in R include {stuId}→{lastName}, but not the reverse {stuId} →{lastName, major, credits, status, socSecNo, stuId} {socSecNo} →{stuId, lastName, major, credits, status, socSecNo} {credits}→{status}, but not {status}→{credits} ZipCode→AddressCity :16652 is Huntingdon’s ZIP Dept of Computer Science Page 58 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE ArtistName→BirthYear :Picasso was born in 1881 Autobrand→Manufacturer, Engine type Pontiac is built by General Motors with gasoline engine Author, Title→PublDate Shakespeare’s Hamlet was published in 1600 Trivial Functional Dependency The FD X→Y is trivial if set {Y} is a subset of set {X} Examples: If A and B are attributes of R, {A}→{A} {A,B} →{A} {A,B} →{B} {A,B} →{A,B} are all trivial FDs and will not contribute to the evaluation of normalization. Composite Key: -Composite key is a primary key composed of multiple columns. Functional Dependency – When value of one column is dependent on another column. So that if value of one column changes the value of other column changes as well. e.g. Supplier Address is functionally dependent on supplier name. If supplier’s name is changed in a record we need to change the supplier address as well. S.Supplier–àS.SupplierAddress “In our s table supplier address column is functionally dependent on the supplier column” Partial Functional Dependency – A non-key column is dependent on some, but not all the columns in a composite primary key. In our above example Supplier Address was partially dependent on our composite key columns (Gadgets+Supplier). Transitive Dependency- A transitive dependency is a type of functional dependency in which the value in a non-key column is determined by the value in another non-key column. Dept of Computer Science Page 59 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Term Definition Determinant For any relation R, the set of attributes X in the functional dependency X → Y is the determinant group for the dependency. Full functional dependence An attribute set X is fully functionally dependent on another attribute set Y if X is functionally dependent on the whole set of Y, but not on any subset of Y. Functional dependence For any relation R, attribute X is functionally dependent on attribute Y if for every valid instance, the value of Y determines the value of X. The symbolic notation for this is Y → X. Multivalued dependence Transitive dependence For any relation R with attribute set (X,Y,Z), the set X multivalue determines the set Y if, for every pair of tuples containing duplicates in X, the instance also contains the pair of tuples obtained by interchanging the Y tuples in the original pair. The symbolic notation for this is X ⇒ Y. Transitive dependencies are dependencies that exist among three or more attributes in a relation in such a way that one attribute determines a third by way of an intermediate attribute. For example, consider the relation R(X,Y,Z), where X is the primary key. Attribute Z is transitively dependent on attribute X if attribute Y satisfies the following dependencies, where the symbol → indicates does not determine: •X → Y •Y → Z •Y → X Dept of Computer Science Page 60 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Normalisation 1. 2. 3. 4. 5. While designing a database out of an entity–relationship model, the main problem existing in that “raw” database is redundancy. Redundancy is storing the same data item in more one place. A redundancy creates several problems like the following: Extra storage space: storing the same data in many places takes large amount of disk space. Entering same data more than once during data insertion. Deleting data from more than one place during deletion. Modifying data in more than one place. Anomalies may occur in the database if insertion, deletion, modification etc are no done properly. It creates inconsistency and unreliability in the database. To solve this problem, the “raw” database needs to be normalized. This is a step by step process of removing different kinds of redundancy and anomaly at each step. At each step a specific rule is followed to remove specific kind of impurity in order to give the database a slim and clean look. Un-Normalized Form (UNF) If a table contains non-atomic values at each row, it is said to be in UNF. An atomic value is something that cannot be further decomposed. A non-atomic value, as the name suggests, can be further decomposed and simplified. Consider the following table: Emp-Id Emp-Name Month Sales Bank-Id Bank-Name E01 AA Jan 1000 B01 SBI Feb 1200 Mar 850 E02 BB Jan 2200 B02 UTI Feb 2500 E03 CC Jan 1700 B01 SBI Feb 1800 Mar 1850 Apr 1725 In the sample table above, there are multiple occurrences of rows under each key EmpId. Although considered to be the primary key, Emp-Id cannot give us the unique identification facility for any single row. Further, each primary key points to a variable length record (3 for E01, 2 for E02 and 4 for E03). Dept of Computer Science Page 61 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE WHY WE NEED NORMALIZATION? Normalization is the aim of well design Relational Database Management System (RDBMS). It is step by step set of rules by which data is put in its simplest forms. We normalize the relational database management system because of the following reasons: Minimize data redundancy i.e. no unnecessarily duplication of data. To make database structure flexible i.e. it should be possible to add new data values and rows without reorganizing the database structure. Data should be consistent throughout the database i.e. it should not suffer from following anomalies. Insert Anomaly - Due to lack of data i.e., all the data available for insertion such that null values in keys should be avoided. This kind of anomaly can seriously damage a database Update Anomaly - It is due to data redundancy i.e. multiple occurrences of same values in a column. This can lead to inefficiency. Deletion Anomaly - It leads to loss of data for rows that are not stored elsewhere. It could result in loss of vital data. Complex queries required by the user should be easy to handle. On decomposition of a relation into smaller relations with fewer attributes on normalization the resulting relations whenever joined must result in the same relation without any extra rows. The join operations can be performed in any order. This is known as Lossless Join decomposition. The resulting relations (tables) obtained on normalization should possess the properties such as each row must be identified by a unique key, no repeating groups, homogenous columns, each column is assigned a unique name etc. ADVANTAGES OF NORMALIZATION The following are the advantages of the normalization. More efficient data structure. Avoid redundant fields or columns. More flexible data structure i.e. we should be able to add new rows and data values easily Better understanding of data. Ensures that distinct tables exist when necessary. Dept of Computer Science Page 62 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE - Easier to maintain data structure i.e. it is easy to perform operations and complex queries can be easily handled. - Minimizes data duplication. - Close modeling of real world entities, processes and their relationships. DISADVANTAGES OF NORMALIZATION The following are disadvantages of normalization. - You cannot start building the database before you know what the user needs. - On normalizing the relations to higher normal forms i.e. 4NF, 5NF the performance degrades. - It is very time consuming and difficult process in normalizing relations of higher degree. - Careless decomposition may leads to bad design of database which may leads to serious problems. First Normal Form (1NF) A relation is said to be in 1NF if it contains no non-atomic values and each row can provide a unique combination of values. The above table in UNF can be processed to create the following table in 1NF. Emp-Name Month Sales Bank-Id Bank-Name Emp-Id E01 AA Jan 1000 B01 SBI E01 AA Feb 1200 B01 SBI E01 AA Mar 850 B01 SBI E02 BB Jan 2200 B02 UTI E02 BB Feb 2500 B02 UTI E03 CC Jan 1700 B01 SBI E03 CC Feb 1800 B01 SBI E03 CC Mar 1850 B01 SBI E03 CC Apr 1725 B01 SBI As you can see now, each row contains unique combination of values. Unlike in UNF, this relation contains only atomic values, i.e. the rows can not be further decomposed, so the relation is now in 1NF. Second Normal Form (2NF) A relation is said to be in 2NF f if it is already in 1NF and each and every attribute fully depends on the primary key of the relation. Speaking inversely, if a table has some attributes which is not dependant on the primary key of that table, then it is not in 2NF. Dept of Computer Science Page 63 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Let us explain. Emp-Id is the primary key of the above relation. Emp-Name, Month, Sales and Bank-Name all depend upon Emp-Id. But the attribute Bank-Name depends on Bank-Id, which is not the primary key of the table. So the table is in 1NF, but not in 2NF. If this position can be removed into another related relation, it would come to 2NF. Emp-Id Emp-Name Month Sales Bank-Id E01 AA JAN 1000 B01 E01 AA FEB 1200 B01 E01 AA MAR 850 B01 E02 BB JAN 2200 B02 E02 BB FEB 2500 B02 E03 CC JAN 1700 B01 E03 CC FEB 1800 B01 E03 CC MAR 1850 B01 E03 CC APR 1726 B01 Bank-Id Bank-Name B01 SBI B02 UTI After removing the portion into another relation we store lesser amount of data in two relations without any loss information. There is also a significant reduction in redundancy. Third Normal Form (3NF) A relation is said to be in 3NF, if it is already in 2NF and there exists no transitive dependency in that relation. Speaking inversely, if a table contains transitive dependency, then it is not in 3NF, and the table must be split to bring it into 3NF. What is a transitive dependency? Within a relation if we see A → B [B depends on A] And B → C [C depends on B] Then we may derive A → C[C depends on A] Such derived dependencies hold well in most of the situations. For example if we have Roll → Marks And Marks → Grade Dept of Computer Science Page 64 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Then we may safely derive Roll → Grade. This third dependency was not originally specified but we have derived it. The derived dependency is called a transitive dependency when such dependency becomes improbable. For example we have been given Roll → City And City → STDCode If we try to derive Roll → STDCode it becomes a transitive dependency, because obviously the STDCode of a city cannot depend on the roll number issued by a school or college. In such a case the relation should be broken into two, each containing one of these two dependencies: Roll → City And City → STD code Boyce-Code Normal Form (BCNF) A relationship is said to be in BCNF if it is already in 3NF and the left hand side of every dependency is a candidate key. A relation which is in 3NF is almost always in BCNF. These could be same situation when a 3NF relation may not be in BCNF the following conditions are found true. 1. The candidate keys are composite. 2. There are more than one candidate keys in the relation. 3. There are some common attributes in the relation. Professor Code Department Head of Dept. Percent Time P1 Physics Ghosh 50 P1 Mathematics Krishnan 50 P2 Chemistry Rao 25 P2 Physics Ghosh 75 P3 Mathematics Krishnan 100 Consider, as an example, the above relation. It is assumed that: 1. A professor can work in more than one department 2. The percentage of the time he spends in each department is given. 3. Each department has only one Head of Department. Dept of Computer Science Page 65 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE The relation diagram for the above relation is given as the following: The given relation is in 3NF. Observe, however, that the names of Dept. and Head of Dept. are duplicated. Further, if Professor P2 resigns, rows 3 and 4 are deleted. We lose the information that Rao is the Head of Department of Chemistry. The normalization of the relation is done by creating a new relation for Dept. and Head of Dept. and deleting Head of Dept. form the given relation. The normalized relations are shown in the following. Professor Code Department Percent Time P1 Physics 50 P1 Mathematics 50 P2 Chemistry 25 P2 Physics 75 P3 Mathematics 100 Head of Dept. Department Physics Ghosh Mathematics Krishnan Chemistry Rao Dept of Computer Science Page 66 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE See the dependency diagrams for these new relations. 1. 2. 3. 4. Fourth Normal Form (4NF) When attributes in a relation have multi-valued dependency, further Normalization to 4NF and 5NF are required. Let us first find out what multi-valued dependency is. A multi-valued dependency is a typical kind of dependency in which each and every attribute within a relation depends upon the other, yet none of them is a unique primary key. We will illustrate this with an example. Consider a vendor supplying many items to many projects in an organization. The following are the assumptions: A vendor is capable of supplying many items. A project uses many items. A vendor supplies to many projects. An item may be supplied by many vendors. A multi valued dependency exists here because all the attributes depend upon the other and yet none of them is a primary key having unique value. Vendor Code Item Code Project No. V1 I1 P1 V1 I2 P1 V1 I1 P3 V1 I2 P3 V2 I2 P1 V2 I3 P1 V3 I1 P2 V3 I1 P3 The given relation has a number of problems. For example: 1. If vendor V1 has to supply to project P2, but the item is not yet decided, then a row with a blank for item code has to be introduced. 2. The information about item I1 is stored twice for vendor V3. Dept of Computer Science Page 67 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Observe that the relation given is in 3NF and also in BCNF. It still has the problem mentioned above. The problem is reduced by expressing this relation as two relations in the Fourth Normal Form (4NF). A relation is in 4NF if it has no more than one independent multi valued dependency or one independent multi valued dependency with a functional dependency. The table can be expressed as the two 4NF relations given as following. The fact that vendors are capable of supplying certain items and that they are assigned to supply for some projects in independently specified in the 4NF relation. Vendor-Supply Item Code Vendor Code V1 I1 V1 I2 V2 I2 V2 I3 V3 I1 Vendor-Project Project No. Vendor Code V1 P1 V1 P3 V2 P1 V3 P2 Fifth Normal Form (5NF) These relations still have a problem. While defining the 4NF we mentioned that all the attributes depend upon each other. While creating the two tables in the 4NF, although we have preserved the dependencies between Vendor Code and Item code in the first table and Vendor Code and Item code in the second table, we have lost the relationship between Item Code and Project No. If there were a primary key then this loss of dependency would not have occurred. In order to revive this relationship we must add a new table like the following. Please note that during the entire process of normalization, this is the only step where a new table is created by joining two attributes, rather than splitting them into separate tables. Project No. Item Code Dept of Computer Science Page 68 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE P1 P1 P2 P3 P3 11 12 11 11 13 Let us finally summarize the normalization steps we have discussed so far. Input Transformation Output Relation Relation All Eliminate variable length record. Remove multi-attribute 1NF Relations lines in table. 1NF Remove dependency of non-key attributes on part of a multi- 2NF Relation attribute key. 2NF Remove dependency of non-key attributes on other non-key 3NF attributes. 3NF Remove dependency of an attribute of a multi attribute key BCNF on an attribute of another (overlapping) multi-attribute key. BCNF Remove more than one independent multi-valued 4NF dependency from relation by splitting relation. 4NF Add one relation relating attributes with multi-valued 5NF dependency. Dept of Computer Science Page 69 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE UNIT 3 SQL-The Relational Database Standards The name SQL is expanded as Structured Query Language. Originally, SQL was called SEQUEL (Structured English Query Language) and was designed and implemented at IBM Research as the interface for an experimental relational database system called SYSTEM R. SQL is now the standard language for commercial relational DBMSs. A joint effort by the American National Standards Institute (ANSI) and the International Standards Organization (ISO) has led to a standard version of SQL (ANSI 1986), called SQL-86 or SQL1. A revised and much expanded standard called SQL-92 (also referred to as SQL2) was subsequently developed. The next standard that is well-recognized is SQL: 1999, which started out as SQL3. Two later updates to the standard are SQL: 2003 and SQL: 2006, which added XML features among other updates to the language. Another update in 2008 incorporated more object database features in. SQL is a comprehensive database language: It has statements for data definitions, queries, and updates. Hence, it is both a DDL and a DML. In addition, it has facilities for defining views on the database, for specifying security and authorization, for defining integrity constraints, and for specifying transaction controls. It also has rules for embedding SQL statements into a general-purpose programming language such as Java, COBOL, or C/C++.1 The later SQL standards (starting with SQL: 1999) are divided into a core specification plus specialized extensions. SQL Data Definition and Data Types SQL uses the terms table, row, and column for the formal relational model terms relation, tuple, and attribute, respectively. We will use the corresponding terms interchangeably. The main SQL command for data definition is the CREATE statement, which can be used to create schemas, tables (relations), and domains (as well as other constructs such as views, assertions, and triggers). Dept of Computer Science Page 70 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE 1. Schema and Catalog Concepts in SQL An SQL schema is identified by a schema name, and includes an authorization identifier to indicate the user or account who owns the schema, as well as descriptors for each element in the schema. Schema elements include tables, constraints, views, domains, and other constructs (such as authorization grants) that describe the schema. A schema is created via the CREATE SCHEMA statement, which can include all the schema elements’ definitions. Alternatively, the schema can be assigned a name and authorization identifier, and the elements can be defined later. For example, the following statement creates a schema called COMPANY, owned by the user with authorization identifier ‘Jsmith’. Note that each statement in SQL ends with a semicolon. CREATE SCHEMA COMPANY AUTHORIZATION ‘Jsmith’; In general, not all users are authorized to create schemas and schema elements. The privilege to create schemas, tables, and other constructs must be explicitly granted to the relevant user accounts by the system administrator or DBA. 2. The CREATE TABLE Command in SQL The CREATE TABLE command is used to specify a new relation by giving it a name and specifying its attributes and initial constraints. The attributes are specified first and each attribute is given a name, a data type to specify its domain of values, and any attribute constraints, such as NOT NULL. CREATE TABLE COMPANY.EMPLOYEE(...); rather than we can explicitly (rather than implicitly) make the EMPLOYEE table part of the COMPANY schema. fig : The relations declared through CREATE TABLE statements are called base tables (or base relations).This means that the relation and its tuples are actually created and stored as a file by the DBMS. Dept of Computer Science Page 71 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE 3. Attribute Data Types and Domains in SQL The basic data types available for attributes include numeric, character string, bit string, Boolean, date, and time. 1) Numeric data types include integer numbers of various sizes (INTEGER or INT, and SMALLINT) and floating-point (real) numbers of various precision (FLOAT or REAL, and DOUBLE PRECISION). Formatted numbers can be declared by using DECIMAL(i,j)—or DEC(i,j) or NUMERIC(i,j)—where i, the precision, is the total number of decimal digits and j, the scale, is the number of digits after the decimal point. The default for scale is zero, and the default for precision is implementation-defined. 2)Character-string data types are either fixed length—CHAR(n) or CHARACTER(n), where n is the number of characters—or varying length— VARCHAR(n) or CHAR VARYING(n) or CHARACTER VARYING(n), where n is the maximum number of characters. 3) Bit-string data types are either of fixed length n—BIT(n)—or varying length—BIT VARYING(n), where n is the maximum number of bits. The default for n, the length of a character string or bit string, is 1. 4) A Boolean data type has the traditional values of TRUE or FALSE. In SQL, because of the presence of NULL values, a three-valued logic is used, so a third possible value for a Boolean data type is UNKNOWN. 5) The DATE data type has ten positions, and its components are YEAR, MONTH, and DAY in the form YYYY-MM-DD. The TIME data type has at least eight positions, with the components HOUR, MINUTE, and SECOND in the form HH:MM:SS 6) A timestamp data type (TIMESTAMP) includes the DATE and TIME fields, plus a minimum of six positions for decimal fractions of seconds and an optional WITH TIME ZONE qualifier. Literal values are represented by single quoted strings preceded by the keyword TIMESTAMP, with a blank space between data and time; for example, TIMESTAMP ‘2008-09-27 09:12:47.648302’. 7) another data type related to DATE, TIME, and TIMESTAMP is the INTERVAL data type. This specifies an interval—a relative value that can be used to increment or decrement an absolute value of a date, time, or timestamp. Intervals are qualified to be either YEAR/MONTH intervals or DAY/TIME intervals. DOMAIN: It is possible to specify the data type of each attribute directly, alternatively, a domain can be declared, and the domain name used with the attribute specification. This makes it easier to change the data type for a domain that is used by numerous attributes in a schema, and improves schema readability. For example, we can create a domain SSN_TYPE by the following statement: CREATE DOMAIN SSN_TYPE AS CHAR(9); We can use SSN_TYPE in place of CHAR(9). Dept of Computer Science Page 72 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Specifying Constraints in SQL This section describes the basic constraints that can be specified in SQL as part of table creation. These include key and referential integrity constraints, restrictions on attribute domains and NULLs, and constraints on individual tuples within a relation. 1. Specifying Attribute Constraints and Attribute Defaults 1) NOT NULL: Because SQL allows NULLs as attribute values, a constraint NOT NULL may be specified if NULL is not permitted for a particular attribute. This is always implicitly specified for the attributes that are part of the primary key of each relation, but it can be specified for any other attributes whose values are required not to be NULL. 2) DEFAULT: It is also possible to define a default value for an attribute by appending the clause DEFAULT <value> to an attribute definition. The default value is included in any new tuple if an explicit value is not provided for that attribute. Figure illustrates an example of specifying a default manager for a new department and a default department for a new employee. If no default clause is specified, the default default value is NULL for attributes that do not have the NOT NULL constraint. Ex:CREATE TABLE EMPLOYEE ( . .. ,Dno INT NOT NULL DEFAULT 1,..); 3) CHECK: Another type of constraint can restrict attribute or domain values using the CHECK clause following an attribute or domain definition.For example, suppose that department numbers are restricted to integer numbers between 1 and 20; as the following: Dnumber INT NOT NULL CHECK (Dnumber > 0 AND Dnumber < 21); The CHECK clause can also be used in conjunction with the CREATE DOMAIN statement.For example, we can write the following statement: CREATE DOMAIN D_NUM AS CHECK (D_NUM > 0 AND D_NUM < 21); We can then use the created domain D_NUM as the attribute type for all attributes that refer to department numbers , such as Dnumber of DEPARTMENT, Dnum of PROJECT, Dno of EMPLOYEE, and so on. 2. Specifying Key and Referential Integrity Constraints Because keys and referential integrity constraints are very important, there are special clauses within the CREATE TABLE statement to specify them. A primary key has a single attribute. Referential integrity is specified via the FOREIGN KEY clause. The UNIQUE clause specifies alternate (secondary) key. Dname VARCHAR(15) UNIQUE; For example, Dept of Computer Science Page 73 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Ex :CREATE TABLE EMPLOYEE ( Ssn NUMBER(5), . . , Dno INT NOT NULL DEFAULT 1, PRIMARY KEY (Ssn)); Ex:CREATE TABLE DEPARTMENT(Dnumber int,Dname varchar(40) UNIQUE, Super_ssn number(5), Mgr_ssn CHAR(9) NOT NULL DEFAULT ‘888665555’, PRIMARY KEY(Dnumber), FOREIGN KEY (Super_ssn) REFERENCES EMPLOYEE(Ssn) ); 3. CASCADE ON DELETE or ON UPDATE: A referential integrity constraint can be violated when tuples are inserted or deleted, or when a foreign key or primary key attribute value is modified. The default action that SQL takes for an integrity violation is to reject the update operation that will cause a violation, which is known as the RESTRICT option. However, the schema designer can specify an alternative action to be taken by attaching a referential triggered action clause to any foreign key constraint. The options include SET NULL, CASCADE, and SET DEFAULT. An option must be qualified with either ON DELETE or ON UPDATE. We illustrate this with the examples shown in Figure 3.1 4. Giving Names to Constraints Figure 3.1 also illustrates how a constraint may be given a constraint name, following the keyword CONSTRAINT. The names of all constraints within a particular schema must be unique. A constraint name is used to identify a particular constraint in case the constraint must be dropped later and replaced with another constraint. Giving names to constraints is optional. Figure 3.1 Here, the database designer chooses ON DELETE SET NULL and ON UPDATE CASCADE for the foreign key Super_ssn of EMPLOYEE. This means that if the tuple Dept of Computer Science Page 74 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE for a supervising employee is deleted, the value of Super_ssn is automatically set to NULL for all employee tuples that were referencing the deleted employee tuple. On the other hand, if the Ssn value for a supervising employee is updated (say, because it was entered incorrectly), the new value is cascaded to Super_ssn for all employee tuples referencing the updated employee tuple. Basic Retrieval Queries in SQL SQL has one basic statement for retrieving information from a database: the SELECT statement. The SELECT statement is not the same as the SELECT operation of relational algebra. There are many options and flavors to the SELECT statement in SQL. 1. The SELECT-FROM-WHERE Structure of Basic SQL Queries The basic form of the SELECT statement, sometimes called a mapping or a selectfrom-where block, is formed of the three clauses SELECT, FROM, and WHERE and has the following form: SELECT <attribute list> FROM <table list> WHERE <condition>; where ■ <attribute list> is a list of attribute names whose values are to be retrieved by the query. ■ <table list> is a list of the relation names required to process the query. ■ <condition> is a conditional (Boolean) expression that identifies the tuples to be retrieved by the query. In SQL, the basic logical comparison operators for comparing attribute values with one another and with literal constants are =, <, <=, >, >=, and <>. These correspond to the relational algebra operators =, <, ≤, >, ≥, and ≠, respectively, and to the C/C++ programming language operators =, <, <=, >, >=, and !=. The main syntactic difference is the not equal operator. SQL has additional comparison operators that we will present gradually. Query 0. Retrieve the birth date and address of the employee(s) whose name is ‘John B. Smith’. Q0: SELECT Bdate, Address FROM EMPLOYEE Dept of Computer Science Page 75 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE WHERE Fname=‘John’ AND Minit=‘B’ AND Lname=‘Smith’; This query involves only the EMPLOYEE relation listed in the FROM clause. The query selects the individual EMPLOYEE tuples that satisfy the condition of the WHERE clause, then projects the result on the Bdate and Address attributes listed in the SELECT clause. The SELECT clause of SQL specifies the attributes, whose values are to be retrieved, which are called the projection attributes, and the WHERE clause specifies the Boolean condition that must be true for any retrieved tuple, which is known as the selection condition. We can think of an implicit tuple variable or iterator in the SQL query ranging or looping over each individual tuple in the EMPLOYEE table and evaluating the condition in the WHERE clause. Only those tuples that satisfy the condition—that is, those tuples for which the condition evaluates to TRUE after substituting their corresponding attribute values—are selected. Query 1. Retrieve the name and address of all employees who work for the ‘Research’ department. Q1: SELECT Fname, Lname, Address FROM EMPLOYEE, DEPARTMENT WHERE Dname=‘Research’ AND Dnumber=Dno; In the WHERE clause of Q1, the condition Dname = ‘Research’ is a selection condition that chooses the particular tuple of interest in the DEPARTMENT table, because Dname is an attribute of DEPARTMENT. The condition Dnumber = Dno is called a join condition, because it combines two tuples: one from DEPARTMENT and one from EMPLOYEE, whenever the value of Dnumber in DEPARTMENT is equal to the value of Dno in EMPLOYEE. In general, any number of selection and join conditions may be specified in a singleSQL query. A query that involves only selection and join conditions plus projection attributes isknown as a select-project-join query. The next example is a select-project-join query with two join conditions. Query 2. For every project located in ‘Stafford’, list the project number, the controlling department number, and the department manager’s last name, address, and birth date. Q2: SELECT Pnumber, Dnum, Lname, Address, Bdate FROM PROJECT, DEPARTMENT, EMPLOYEE WHERE Dnum=Dnumber AND Mgr_ssn=Ssn AND Plocation=‘Stafford’; The join condition Dnum = Dnumber relates a project tuple to its controlling department tuple, whereas the join condition Mgr_ssn = Ssn relates the controlling department tuple to the employee tuple who manages that department. Each tuple in the result will be a combination of one project, one department, and one employee that satisfies the join Dept of Computer Science Page 76 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE conditions. The projection attributes are used to choose the attributes to be displayed from each combined tuple. 2. Ambiguous Attribute Names, Aliasing, Renaming, and Tuple Variables In SQL, the same name can be used for two (or more) attributes as long as the attributes are in different relations. If this is the case, and a multitable query refers to two or more attributes with the same name, we must qualify the attribute name with the relation name to prevent ambiguity. This is done by prefixing the relation name to the attribute name and separating the two by a period. To illustrate this, the Dno and Lname attributes of the EMPLOYEE relation were called Dnumber and Name, and the Dname attribute of DEPARTMENT was also called Name; then, to prevent ambiguity, query Q1 would be rephrased as shown in Q1A.We must prefix the attributes Name and Dnumber in Q1A to specify which ones we are referring to, because the same attribute names are used in both relations: Q1A: SELECT Fname, EMPLOYEE.Name, Address FROM EMPLOYEE, DEPARTMENT WHERE DEPARTMENT.Name=‘Research’ AND DEPARTMENT.Dnumber=EMPLOYEE.Dnumber; Fully qualified attribute names can be used for clarity even if there is no ambiguity in attribute names. Q1 is shown in this manner as is Q1_ below.We can also create an alias for each table name to avoid repeated typing of long table names (see Q8 below). Q1_: SELECT EMPLOYEE.Fname, EMPLOYEE.LName, EMPLOYEE.Address FROM EMPLOYEE, DEPARTMENT WHERE DEPARTMENT.DName=‘Research’ AND DEPARTMENT.Dnumber=EMPLOYEE.Dno; The ambiguity of attribute names also arises in the case of queries that refer to the same relation twice, as in the following example. Query 8. For each employee, retrieve the employee’s first and last name and the first and last name of his or her immediate supervisor. Q8: SELECT E.Fname, E.Lname, S.Fname, S.Lname FROM EMPLOYEE AS E, EMPLOYEE AS S WHERE E.Super_ssn=S.Ssn; In this case, we are required to declare alternative relation names E and S, called aliases or tuple variables, for the EMPLOYEE relation. An alias can follow the keyword AS, as shown in Q8, or it can directly follow the relation name—for example, by writing EMPLOYEE E, EMPLOYEE S in the FROM clause of Q8. It is also possible to rename the relation attributes within the query in SQL by giving them aliases. For example, if we write Dept of Computer Science Page 77 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE EMPLOYEE AS E (Fn, Mi, Ln, Ssn, Bd, Addr, Sex, Sal, Sssn, Dno) in the FROM clause, Fn becomes an alias for Fname, Mi for Minit, Ln for Lname, and so on. In Q8, we can think of E and S as two different copies of the EMPLOYEE relation; the first, E, represents employees in the role of supervisees or subordinates; the second, S, represents employees in the role of supervisors. We can now join the two copies. We can use this alias-naming mechanism in any SQL query to specify tuple variables for every table in the WHERE clause, whether or not the same relation needs to be referenced more than once. In fact, this practice is recommended since it results in queries that are easier to comprehend. For example, we could specify query Q1 as in Q1B: Q1B: SELECT E.Fname, E.LName, E.Address FROM EMPLOYEE E, DEPARTMENT D WHERE D.DName=‘Research’ AND D.Dnumber=E.Dno; 3. Unspecified WHERE Clause and Use of the Asterisk We discuss two more features of SQL here. A missing WHERE clause indicates no condition on tuple selection; hence, all tuples of the relation specified in the FROM clause qualify and are selected for the query result. If more than one relation is specified in the FROM clause and there is no WHERE clause, then the CROSS PRODUCT—all possible tuple combinations—of these relations is selected. For example, Query 9 selects all EMPLOYEE and Query 10 selects all combinations of an EMPLOYEE Ssn and a DEPARTMENT Dname, regardless of whether the employee works for the department or not . Queries 9 and 10. Select all EMPLOYEE Ssns (Q9) and all combinations of EMPLOYEE Ssn and DEPARTMENT Dname (Q10) in the database. Q9: SELECT Ssn FROM EMPLOYEE; Q10: SELECT Ssn, Dname FROM EMPLOYEE, DEPARTMENT; It is extremely important to specify every selection and join condition in the WHERE clause; if any such condition is overlooked, incorrect and very large relations may result. Notice that Q10 is similar to a CROSS PRODUCT operation followed by a PROJECT operation in relational algebra . If we specify all the attributes of EMPLOYEE and DEPARTMENT in Q10, we get the actual CROSS PRODUCT (except for duplicate elimination, if any). To retrieve all the attribute values of the selected tuples, we do not have to list the attribute names explicitly in SQL; we just specify an asterisk (*), which stands for all the attributes. For example, query Q1C retrieves all the attribute values of any EMPLOYEE who works in DEPARTMENT number 5 , query Q1D retrieves all the attributes of an EMPLOYEE and the attributes of the DEPARTMENT in which he or Dept of Computer Science Page 78 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE she works for every employee of the ‘Research’ department, and Q10A specifies the CROSS PRODUCT of the EMPLOYEE and DEPARTMENT relations. Q1C: SELECT * FROM EMPLOYEE WHERE Dno=5; Q1D: SELECT * FROM EMPLOYEE, DEPARTMENT WHERE Dname=‘Research’ AND Dno=Dnumber; Q10A: SELECT * FROM EMPLOYEE, DEPARTMENT; 4. Tables as Sets in SQL As we mentioned earlier, SQL usually treats a table not as a set but rather as a multiset; duplicate tuples can appear more than once in a table, and in the result of a query. SQL does not automatically eliminate duplicate tuples in the results of queries, for the following reasons: ■ Duplicate elimination is an expensive operation. One way to implement it is to sort the tuples first and then eliminate duplicates. ■ the user may want to see duplicate tuples in the result of a query. ■ When an aggregate function is applied to tuples, in most cases we do not want to eliminate duplicates. An SQL table with a key is restricted to being a set, since the key value must be distinct in each tuple. If we do want to eliminate duplicate tuples from the result of an SQL query, we use the keyword DISTINCT in the SELECT clause, meaning that only distinct tuples should remain in the result. In general, a query with SELECT DISTINCT eliminates duplicates, whereas a query with SELECT ALL does not. Do not specify SELECT with neither ALL nor DISTINCT— as in our previous examples— is equivalent to SELECT ALL. For example, Q11 retrieves the salary of every employee; if several employees have the same salary, that salary value will appear as many times in the result of the query, as shown in Figure 4.4(a). Dept of Computer Science Page 79 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE If we are interested only in distinct salary values, we want each value to appear only once, regardless of how many employees earn that salary. By using the keyword DISTINCT as in Q11A, we accomplish this, as shown in Figure 4.4(b). Query 11. Retrieve the salary of every employee (Q11) and all distinct salary values (Q11A). Q11: SELECT ALL Salary FROM EMPLOYEE; Q11A: SELECT DISTINCT Salary FROM EMPLOYEE; Set operations SQL has directly incorporated some of the set operations from mathematical set theory, which are also part of relational algebra. There are set union (UNION), set difference (EXCEPT), and set intersection (INTERSECT) operations. The relations resulting from these set operations are sets of tuples; that is, duplicate tuples are eliminated from the result. These set operations apply only to unioncompatible relations, so we must make sure that the two relations on which we apply the operation have the same attributes and that the attributes appear in the same order in both relations. The next example illustrates the use of UNION. Query 4. Make a list of all project numbers for projects that involve an employee whose last name is ‘Smith’, either as a worker or as a manager of the department that controls the project. Q4A: (SELECT DISTINCT Pnumber FROM PROJECT, DEPARTMENT, EMPLOYEE WHERE Dnum=Dnumber AND Mgr_ssn=Ssn AND Lname=‘Smith’ ) UNION ( SELECT DISTINCT Pnumber FROM PROJECT, WORKS_ON, EMPLOYEE WHERE Pnumber=Pno AND Essn=Ssn AND Lname=‘Smith’ ); The first SELECT query retrieves the projects that involve a ‘Smith’ as manager of the department that controls the project and the second retrieves the projects that involve a ‘Smith’ as a worker on the project. Notice that if several employees have the last name ‘Smith’, the project names involving any of them will be retrieved. Applying the UNION operation to the two SELECT queries gives the desired result. SQL also has corresponding multiset operations, which are followed by the keyword ALL (UNION ALL, EXCEPT ALL, INTERSECT ALL). Their results are multisets (duplicates are not eliminated). The behavior of these operations is illustrated by the Dept of Computer Science Page 80 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE examples in Figure 4.5. Basically, each tuple—whether it is a duplicate or not—is considered as a different tuple when applying these operations. 5. Substring Pattern Matching and Arithmetic Operators In this section we discuss several more features of SQL. The first feature allows comparison conditions on only parts of a character string, using the LIKE comparison operator. This can be used for string pattern matching. Partial strings are specified using two reserved characters: % replaces an arbitrary number of zero or more characters, and the underscore (_) replaces a single character. For example, consider the following query. Query 12. Retrieve all employees whose address is in Houston, Texas. Q12: SELECT Fname, Lname FROM EMPLOYEE WHERE Address LIKE ‘%Houston,TX%’; To retrieve all employees who were born during the 1950s, we can use Query Q12A. Here, ‘5’must be the third character of the string (according to our format for date), so we use the value ‘_ _ 5 _ _ _ _ _ _ _’, with each underscore serving as a placeholder for an arbitrary character. Query 12A. Find all employees who were born during the 1950s. Q12: SELECT Fname, Lname FROM EMPLOYEE WHERE Bdate LIKE ‘_ _ 5 _ _ _ _ _ _ _’; Another feature allows the use of arithmetic in queries. The standard arithmetic operators for addition (+), subtraction (–), multiplication (*), and division (/) can be applied to numeric values or attributes with numeric domains. For example, suppose that we want to see the effect of giving all employees who work on the ‘ProductX’ project a 10 percent raise; we can issue Query 13 to see what their salaries would become. This example also shows how we can rename an attribute in the query result using AS in the SELECT clause. Query 13. Show the resulting salaries if every employee working on the Dept of Computer Science Page 81 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE ‘ProductX’ project is given a 10 percent raise. Q13: SELECT E.Fname, E.Lname, 1.1 * E.Salary AS Increased_sal FROM EMPLOYEE AS E, WORKS_ON AS W, PROJECT AS P WHERE E.Ssn=W.Essn AND W.Pno=P.Pnumber AND P.Pname=‘ProductX’; For string data types, the concatenate operator || can be used in a query to append two string values. For date, time, timestamp, and interval data types, operators include incrementing (+) or decrementing (–) a date, time, or timestamp by an interval. In addition, an interval value is the result of the difference between two date, time, or timestamp values. Another comparison operator, which can be used for convenience, is BETWEEN, which is illustrated in Query 14. Query 14. Retrieve all employees in department 5 whose salary is between $30,000 and $40,000. Q14: SELECT * FROM EMPLOYEE WHERE (Salary BETWEEN 30000 AND 40000) AND Dno = 5; The condition (Salary BETWEEN 30000 AND 40000) in Q14 is equivalent to the condition((Salary >= 30000) AND (Salary <= 40000)). 6. Ordering of Query Results SQL allows the user to order the tuples in the result of a query by the values of one or more of the attributes that appear in the query result, by using the ORDER BY clause. This is illustrated by Query 15. Query 15. Retrieve a list of employees and the projects they are working on, ordered by department and, within each department, ordered alphabetically by last name, then first name. Q15: SELECT D.Dname, E.Lname, E.Fname, P.Pname FROM DEPARTMENT D, EMPLOYEE E, WORKS_ON W, PROJECT P WHERE D.Dnumber= E.Dno AND E.Ssn= W.Essn AND W.Pno= P.Pnumber ORDER BY D.Dname, E.Lname, E.Fname; The default order is in ascending order of values. We can specify the keyword DESC if we want to see the result in a descending order of values. The keyword ASC can be used to specify ascending order explicitly. For example, if we want descending alphabetical order on Dname and ascending order on Lname, Fname, the ORDER BY clause of Q15 can be written as ORDER BY D.Dname DESC, E.Lname ASC, E.Fname ASC 7. Discussion and Summary of Basic SQL Retrieval Queries Dept of Computer Science Page 82 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE A simple retrieval query in SQL can consist of up to four clauses, but only the first two—SELECT and FROM—are mandatory. The clauses are specified in the following order, with the clauses between square brackets [ ... ] being optional: SELECT <attribute list> FROM <table list> [ WHERE <condition> ] [ ORDER BY <attribute list> ]; The SELECT clause lists the attributes to be retrieved, and the FROM clause specifies all relations (tables) needed in the simple query. The WHERE clause identifies the conditions for selecting the tuples from these relations, including join conditions if needed. ORDER BY specifies an order for displaying the results of a query. INSERT, DELETE, and UPDATE Statements in SQL In SQL, three commands can be used to modify the database: INSERT, DELETE, and UPDATE. We discuss each of these in turn. 1 .The INSERT Command In its simplest form, INSERT is used to add a single tuple to a relation.We must specify the relation name and a list of values for the tuple. The values should be listed in the same order in which the corresponding attributes were specified in the CREATE TABLE command. For example, U1: INSERT INTO EMPLOYEE VALUES ( ‘Richard’, ‘K’, ‘Marini’, ‘653298653’, ‘1962-12-30’, ‘98 Oak Forest, Katy, TX’, ‘M’, 37000, ‘653298653’, 4 ); A second form of the INSERT statement allows the user to specify explicit attribute names that correspond to the values provided in the INSERT command. This is useful if a relation has many attributes but only a few of those attributes are assigned values in the new tuple. However, the values must include all attributes with NOT NULL specification and no default value. Attributes with NULL allowed or DEFAULT values are the ones that can be left out. For example, to enter a tuple for a new EMPLOYEE for whom we know only the Fname, Lname, Dno, and Ssn attributes, we can use U1A: U1A: INSERT INTO EMPLOYEE (Fname, Lname, Dno, Ssn) VALUES (‘Richard’, ‘Marini’, 4, ‘653298653’); Attributes not specified in U1A are set to their DEFAULT or to NULL, and the values are listed in the same order as the attributes are listed in the INSERT command itself. It is also possible to insert into a relation multiple tuples separated by commas in a single INSERT command. The attribute values forming each tuple are enclosed in parentheses. Dept of Computer Science Page 83 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE 2. The DELETE Command The DELETE command removes tuples from a relation. It includes a WHERE clause, similar to that used in an SQL query, to select the tuples to be deleted. Tuples are explicitly deleted from only one table at a time. However, the deletion may propagate to tuples in other relations if referential triggered actions are specified in the referential integrity constraints of the DDL. Depending on the number of tuples selected by the condition in the WHERE clause, zero, one, or several tuples can be deleted by a single DELETE command. A missing WHERE clause specifies that all tuples in the relation are to be deleted; however, the table remains in the database as an empty table. We must use the DROP TABLE command to remove the table definition. The DELETE commands in U4A to U4D, if applied independently to the database , will delete zero, one, four, and all tuples, respectively, from the EMPLOYEE relation: U4A: DELETE FROM EMPLOYEE WHERE Lname=‘Brown’; U4B: DELETE FROM EMPLOYEE WHERE Ssn=‘123456789’; U4C: DELETE FROM EMPLOYEE WHERE Dno=5; U4D: DELETE FROM EMPLOYEE; 3. The UPDATE Command The UPDATE command is used to modify attribute values of one or more selected tuples. As in the DELETE command, a WHERE clause in the UPDATE command selects the tuples to be modified from a single relation. However, updating a primary key value may propagate to the foreign key values of tuples in other relations if such a referential triggered action is specified in the referential integrity constraints of the DDL . An additional SET clause in the UPDATE command specifies the attributes to be modified and their new values. For example, to change the location and controlling department number of project number 10 to ‘Bellaire’ and 5, respectively, we use U5: U5: UPDATE PROJECT SET Plocation = ‘Bellaire’, Dnum = 5 WHERE Pnumber=10; Several tuples can be modified with a single UPDATE command. An example is to give all employees in the ‘Research’ department a 10 percent raise in salary, as shown in U6. In this request, the modified Salary value depends on the original Salary value in each tuple, so two references to the Salary attribute are needed. In the SET clause, the reference to the Salary attribute on the right refers to the old Salary value before modification, and the one on the left refers to the new Salary value after modification: U6: UPDATE EMPLOYEE Dept of Computer Science Page 84 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE SET Salary = Salary * 1.1 WHERE Dno = 5; It is also possible to specify NULL or DEFAULT as the new attribute value.Notice that each UPDATE command explicitly refers to a single relation only. To modify multiple relations, we must issue several UPDATE commands. More Complex SQL Retrieval Queries 1. Comparisons Involving NULL and Three-Valued Logic SQL has various rules for dealing with NULL values. Recall from Section 3.1.2 that NULL is used to represent a missing value, but that it usually has one of three different interpretations—value unknown (exists but is not known), value not available (exists but is purposely withheld), or value not applicable (the attribute is undefined for this tuple). Consider the following examples to illustrate each of the meanings of NULL. 1. Unknown value. A person’s date of birth is not known, so it is represented by NULL in the database. 2. Unavailable or withheld value. A person has a home phone but does not want it to be listed, so it is withheld and represented as NULL in the database. 3. Not applicable attribute. An attribute LastCollegeDegree would be NULL for a person who has no college degrees because it does not apply to that person. Query 18. Retrieve the names of all employees who do not have supervisors. Q18: SELECT Fname, Lname FROM EMPLOYEE WHERE Super_ssn IS NULL; 2. Nested Queries, Tuples,and Set/Multiset Comparisons Some queries require that existing values in the database be fetched and then used in a comparison condition. Such queries can be conveniently formulated by using nested queries, which are complete select-from-where blocks within the WHERE clause of another query. That other query is called the outer query. Query 4 is formulated in Q4 without a nested query, but it can be rephrased to use nested queries as shown in Q4A. Q4A introduces the comparison operator IN, which compares a value v with a set (or multiset) of values V and evaluates to TRUE if v is one of the elements in V. The first nested query selects the project numbers of projects that have an employee with last name ‘Smith’ involved as manager, while the second nested query selects the project numbers of projects that have an employee with last name ‘Smith’ involved as Dept of Computer Science Page 85 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE worker. In the outer query, we use the OR logical connective to retrieve a PROJECT tuple if the PNUMBER value of that tuple is in the result of either nested query. Q4A: SELECT DISTINCT Pnumber FROM PROJECT WHERE Pnumber IN ( SELECT Pnumber FROM PROJECT, DEPARTMENT, EMPLOYEE WHERE Dnum=Dnumber AND Mgr_ssn=Ssn AND Lname=‘Smith’ ) OR Pnumber IN ( SELECT Pno FROM WORKS_ON, EMPLOYEE WHERE Essn=Ssn AND Lname=‘Smith’ ); If a nested query returns a single attribute and a single tuple, the query result will be a single (scalar) value. In such cases, it is permissible to use = instead of IN for the comparison operator. In general, the nested query will return a table (relation), which is a set or multiset of tuples. SQL allows the use of tuples of values in comparisons by placing them within parentheses. To illustrate this, consider the following query: SELECT DISTINCT Essn FROM WORKS_ON WHERE (Pno, Hours) IN ( SELECT Pno, Hours FROM WORKS_ON WHERE Essn=‘123456789’ ); This query will select the Essns of all employees who work the same (project, hours) combination on some project that employee ‘John Smith’ (whose Ssn = ‘123456789’) works on. In this example, the IN operator compares the subtuple of values in parentheses (Pno, Hours) within each tuple in WORKS_ON with the set of type-compatible tuples produced by the nested query. In addition to the IN operator, a number of other comparison operators can be used to compare a single value v (typically an attribute name) to a set or multiset v (typicallya nested query). The = ANY (or = SOME) operator returns TRUE if the value v is equal to some value in the set V and is hence equivalent to IN. The two keywords ANY and SOME have the same effect. Other operators that can be combined with ANY (or SOME) include >, >=, <, <=, and <>. The keyword ALL can also be combined with each of these operators. For example, the comparison condition (v > ALL V) returns TRUE if the value v is greater than all the values in the set (or multiset) V. An example is the following query, which returns the names of employees whose salary is greater than the salary of all the employees in department 5: Dept of Computer Science Page 86 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE SELECT Lname, Fname FROM EMPLOYEE WHERE Salary > ALL ( SELECT Salary FROM EMPLOYEE WHERE Dno=5 ); Query 16. Retrieve the name of each employee who has a dependent with the same first name and is the same sex as the employee. Q16: SELECT E.Fname, E.Lname FROM EMPLOYEE AS E WHERE E.Ssn IN ( SELECT Essn FROM DEPENDENT AS D WHERE E.Fname=D.Dependent_name AND E.Sex=D.Sex ); 3. Correlated Nested Queries Whenever a condition in the WHERE clause of a nested query references some attribute of a relation declared in the outer query, the two queries are said to be correlated. We can understand a correlated query better by considering that the nested query is evaluated once for each tuple (or combination of tuples) in the outer query. For example, we can think of Q16 as follows: For each EMPLOYEE tuple, evaluate the nested query, which retrieves the Essn values for all DEPENDENT tuples with the same sex and name as that EMPLOYEE tuple; if the Ssn value of the EMPLOYEE tuple is in the result of the nested query, then select that EMPLOYEE tuple. In general, a query written with nested select-from-where blocks and using the = or IN comparison operators can always be expressed as a single block query. For example, Q16 may be written as in Q16A: Q16A: SELECT E.Fname, E.Lname FROM EMPLOYEE AS E, DEPENDENT AS D WHERE E.Ssn=D.Essn AND E.Sex=D.Sex AND E.Fname=D.Dependent_name; 4.The EXISTS and UNIQUE Functions in SQL The EXISTS function in SQL is used to check whether the result of a correlated nested query is empty (contains no tuples) or not. The result of EXISTS is a Boolean value TRUE if the nested query result contains at least one tuple, or FALSE if the nested query result contains no tuples. We illustrate the use of EXISTS—and NOT EXISTS— with some examples. First, we formulate Query 16 in an alternative form that uses EXIST as in Q16B: Dept of Computer Science Page 87 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE SELECT E.Fname, E.Lname FROM EMPLOYEE AS E WHERE EXISTS ( SELECT * FROM DEPENDENT AS D WHERE E.Ssn=D.Essn AND E.Sex=D.Sex AND E.Fname=D.Dependent_name); EXISTS and NOT EXISTS are typically used in conjunction with a correlated nested query. In Q16B, the nested query references the Ssn, Fname, and Sex attributes of the EMPLOYEE relation from the outer query.We can think of Q16B as follows: For each EMPLOYEE tuple, evaluate the nested query, which retrieves all DEPENDENT tuples with the same Essn, Sex, and Dependent_name as the EMPLOYEE tuple; if at least one tuple EXISTS in the result of the nested query, then select that EMPLOYEE tuple. In general, EXISTS(Q) returns TRUE if there is at least one tuple in the result of thenested query Q, and it returns FALSE otherwise. On the other hand, NOT EXISTS(Q) returns TRUE if there are no tuples in the result of nested query Q, and it returns FALSE otherwise. Next, we illustrate the use of NOT EXISTS. Query 6. Retrieve the names of employees who have no dependents. Q6: SELECT Fname, Lname FROM EMPLOYEE WHERE NOT EXISTS ( SELECT * FROM DEPENDENT WHERE Ssn=Essn ); In Q6, the correlated nested query retrieves all DEPENDENT tuples related to a particular EMPLOYEE tuple. If none exist, the EMPLOYEE tuple is selected because the WHERE-clause condition will evaluate to TRUE in this case. We can explain Q6 as follows: For each EMPLOYEE tuple, the correlated nested query selects all DEPENDENT tuples whose Essn value matches the EMPLOYEE Ssn; if the result is empty, no dependents are related to the employee, so we select that EMPLOYEE tuple and retrieve its Fname and Lname. Query 7. List the names of managers who have at least one dependent. Q7: SELECT Fname, Lname FROM EMPLOYEE WHERE EXISTS ( SELECT * FROM DEPENDENT WHERE Ssn=Essn ) AND EXISTS ( SELECT * FROM DEPARTMENT WHERE Ssn=Mgr_ssn ); Q16B: Dept of Computer Science Page 88 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE One way to write this query is shown in Q7, where we specify two nested correlated queries; the first selects all DEPENDENT tuples related to an EMPLOYEE, and the second selects all DEPARTMENT tuples managed by the EMPLOYEE. If at least one of the first and at least one of the second exists, we select the EMPLOYEE tuple. Can you rewrite this query using only a single nested query or no nested queries? The query Q3: Retrieve the name of each employee who works on all the projects controlled by department number 5 can be written using EXISTS and NOT EXISTS in SQL systems. We show two ways of specifying this query Q3 in SQL as Q3A and Q3B. 5. Explicit Sets and Renaming of Attributes in SQL We have seen several queries with a nested query in the WHERE clause. It is also possibleto use an explicit set of values in the WHERE clause, rather than a nested query. Such a set is enclosed in parentheses in SQL. Query 17. Retrieve the Social Security numbers of all employees who work on project numbers 1, 2, or 3. Q17: SELECT DISTINCT Essn FROM WORKS_ON WHERE Pno IN (1, 2, 3); In SQL, it is possible to rename any attribute that appears in the result of a query by adding the qualifier AS followed by the desired new name. Hence, the AS construct can be used to alias both attribute and relation names, and it can be used in both the SELECT and FROM clauses. For example, Q8A shows how query Q8 can be slightly changed to retrieve the last name of each employee and his or her supervisor, while renaming the resulting attribute names as Employee_name and Supervisor_name. The new names will appear as column headers in the query result. Q8A: SELECT E.Lname AS Employee_name, S.Lname AS Supervisor_name FROM EMPLOYEE AS E, EMPLOYEE AS S WHERE E.Super_ssn=S.Ssn; 6. Joined Tables in SQL and Outer Joins The concept of a joined table (or joined relation) was incorporated into SQL to permit users to specify a table resulting from a join operation in the FROM clause of a query. This construct may be easier to comprehend than mixing together all the select and join conditions in the WHERE clause. For example, consider query Q1, which retrieves the name and address of every employee who works for the ‘Research’ department. It may be easier to specify the join of the EMPLOYEE and DEPARTMENT relations first, and then to select the desired tuples and attributes. This can be written in SQL as in Q1A: Q1A: SELECT Fname, Lname, Address Dept of Computer Science Page 89 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE FROM (EMPLOYEE JOIN DEPARTMENT ON Dno=Dnumber) WHERE Dname=‘Research’; The FROM clause in Q1A contains a single joined table. The attributes of such a table are all the attributes of the first table, EMPLOYEE, followed by all the attributes of the second table, DEPARTMENT. The concept of a joined table also allows the user to specify different types of join, such as NATURAL JOIN and various types of OUTERJOIN. In a NATURAL JOIN on two relations R and S, no join condition is specified; an implicit EQUIJOIN condition for each pair of attributes with the same name from R and S is created. Each such pair of attributes is included only once in the resulting relation. If the names of the join attributes are not the same in the base relations, it is possible to rename the attributes so that they match, and then to apply NATURAL JOIN. In this case, the AS construct can be used to rename a relation and all its attributes in the FROM clause. This is illustrated in Q1B, where the DEPARTMENT relation is renamed as DEPT and its attributes are renamed as Dname, Dno (to match the name of the desired join attribute Dno in the EMPLOYEE table), Mssn, and Msdate. The implied join condition for this NATURAL JOIN is EMPLOYEE.Dno=DEPT.Dno, because this is the only pair of attributes with the same name after renaming: Q1B: SELECT Fname, Lname, Address FROM (EMPLOYEE NATURAL JOIN (DEPARTMENT AS DEPT (Dname, Dno, Mssn, Msdate))) WHERE Dname=‘Research’; The default type of join in a joined table is called an inner join, where a tuple isincluded in the result only if a matching tuple exists in the other relation. For example, in query Q8A, only employees who have a supervisor are included in the result; an EMPLOYEE tuple whose value for Super_ssn is NULL is excluded. If the user requires that all employees be included, an OUTER JOIN must be used explicitly In SQL, this is handled by explicitly specifying the keyword OUTER JOIN in a joined table, as illustrated in Q8B: Q8B: SELECT E.Lname AS Employee_name, S.Lname AS Supervisor_name FROM (EMPLOYEE AS E LEFT OUTER JOIN EMPLOYEE AS S ON E.Super_ssn=S.Ssn); There are a variety of outer join operations, In SQL, the options available for specifying joined tables include INNER JOIN (only pairs of tuples that match the join condition are retrieved, same as JOIN), LEFT OUTER JOIN (every tuple in the left table must appear in the result; if it does not have a matching tuple, it is padded with NULL values for the attributes of the right table), RIGHT OUTER JOIN (every tuple in the right table must appear in the result; if it does not have a matching tuple, it is padded with NULL values for the attributes of the left table), and FULL OUTER JOIN. In the latter three options, the keyword Dept of Computer Science Page 90 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE OUTER may be omitted. If the join attributes have the same name, one can also specify the natural join variation of outer joins by using the keyword NATURAL before the operation (for example, NATURAL LEFT OUTER JOIN). The keyword CROSS JOIN is used to specify the CARTESIAN PRODUCT operation , although this should be used only with the utmost care because it generates all possible tuple combinations. It is also possible to nest join specifications; that is, one of the tables in a join may itself be a joined table. This allows the specification of the join of three or more tables as a single joined table, which is called a multiway join. For example, Q2A is a different way of specifying query Q2 from previous Section using the concept of a joined table: Q2A: SELECT Pnumber, Dnum, Lname, Address, Bdate FROM ((PROJECT JOIN DEPARTMENT ON Dnum=Dnumber) JOIN EMPLOYEE ON Mgr_ssn=Ssn) WHERE Plocation=‘Stafford’; Not all SQL implementations have implemented the new syntax of joined tables. In some systems, a different syntax was used to specify outer joins by using the comparison operators +=, =+, and +=+ for left, right, and full outer join, respectively, when specifying the join condition. For example, this syntax is available in Oracle. To specify the left outer join in Q8B using this syntax, we could write the query Q8C as follows: Q8C: SELECT E.Lname, S.Lname FROM EMPLOYEE E, EMPLOYEE S WHERE E.Super_ssn += S.Ssn; 7. Aggregate Functions in SQL Aggregate functions are used to summarize information from multiple tuples into a single-tuple summary. Grouping is used to create subgroups of tuples before summarization. Grouping and aggregation are required in many database applications, and we will introduce their use in SQL through examples. A number of built-in aggregate functions exist: COUNT, SUM, MAX, MIN, and AVG.2 The COUNT function returns the number of tuples or values as specified in a query. The functions SUM, MAX, MIN, and AVG can be applied to a set or multiset of numeric values and return, respectively, the sum, maximum value, minimum value, and average (mean) of those values. These functions can be used in the SELECT clause or in a HAVING clause (which we introduce later). The functions MAX and MIN can also be used with attributes that have nonnumeric domains if the domain values have a total ordering among one another.3We illustrate the use of these functions with sample queries. Dept of Computer Science Page 91 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE Query 19. Find the sum of the salaries of all employees, the maximum salary, the minimum salary, and the average salary. Q19: SELECT SUM (Salary), MAX (Salary), MIN (Salary), AVG (Salary) FROM EMPLOYEE; If we want to get the preceding function values for employees of a specific department— say, the ‘Research’ department—we can write Query 20, where the EMPLOYEE tuples are restricted by the WHERE clause to those employees who work for the ‘Research’ department. Query 20. Find the sum of the salaries of all employees of the ‘Research’ department, as well as the maximum salary, the minimum salary, and the average salary in this department. Q20: SELECT SUM (Salary), MAX (Salary), MIN (Salary), AVG (Salary) FROM (EMPLOYEE JOIN DEPARTMENT ON Dno=Dnumber) WHERE Dname=‘Research’; Queries 21 and 22. Retrieve the total number of employees in the company (Q21) and the number of employees in the ‘Research’ department (Q22). Q21: SELECT COUNT (*) FROM EMPLOYEE; Q22: SELECT COUNT (*) FROM EMPLOYEE, DEPARTMENT WHERE DNO=DNUMBER AND DNAME=‘Research’; Here the asterisk (*) refers to the rows (tuples), so COUNT (*) returns the number of rows in the result of the query. We may also use the COUNT function to count values in a column rather than tuples, as in the next example. Query 23. Count the number of distinct salary values in the database. Q23: SELECT COUNT (DISTINCT Salary) FROM EMPLOYEE; If we write COUNT (SALARY) instead of COUNT(DISTINCT SALARY) in Q23, then duplicate values will not be eliminated. However, any tuples with NULL for SALARY will not be counted. In general, NULL values are discarded when aggregate functions are applied to a particular column (attribute). The preceding examples summarize a whole relation (Q19, Q21, Q23) or a selected subset of tuples (Q20, Q22), and hence all produce single tuples or single values. They illustrate how functions are applied to retrieve a summary value or summary tuple from the database. These functions can also be used in selection conditions involving nested queries.We can specify a correlated nested query with an aggregate function, and then use the nested query in the WHERE clause of an outer query. For example, to retrieve the names of all employees who have two or more dependents (Query 5), we can write the following: Q5: SELECT Lname, Fname Dept of Computer Science Page 92 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE FROM EMPLOYEE WHERE (SELECT COUNT (*) FROM DEPENDENT WHERE Ssn=Essn ) >= 2; The correlated nested query counts the number of dependents that each employee has; if this is greater than or equal to two, the employee tuple is selected. 8. Grouping: The GROUP BY and HAVING Clauses In many cases we want to apply the aggregate functions to subgroups of tuples in a relation, where the subgroups are based on some attribute values. For example, we may want to find the average salary of employees in each department or the number of employees who work on each project. In these cases we need to partition the relation into non overlapping subsets (or groups) of tuples. Each group (partition) will consist of the tuples that have the same value of some attribute(s), called the grouping attribute(s). We can then apply the function to each such group independently to produce summary information about each group. SQL has a GROUP BY clause for this purpose. The GROUP BY clause specifies the grouping attributes, which should also appear in the SELECT clause, so that the value resulting from applying each aggregate function to a group of tuples appears along with the value of the grouping attribute(s). Query 24. For each department, retrieve the department number, the number of employees in the department, and their average salary. Q24: SELECT Dno, COUNT (*), AVG (Salary) FROM EMPLOYEE GROUP BY Dno; Query 25. For each project, retrieve the project number, the project name, and the number of employees who work on that project. Q25: SELECT Pnumber, Pname, COUNT (*) FROM PROJECT, WORKS_ON WHERE Pnumber=Pno GROUP BY Pnumber, Pname; Q25 shows how we can use a join condition in conjunction with GROUP BY. In this case, the grouping and functions are applied after the joining of the two relations. Sometimes we want to retrieve the values of these functions only for groups that satisfy certain conditions. For example, suppose that we want to modify Query 25 so that only projects with more than two employees appear in the result. SQL provides a HAVING clause, which can appear in conjunction with a GROUP BY clause, for this purpose. HAVING provides a condition on the summary information regarding the group of tuples associated with each value of the grouping attributes. Only the groups Dept of Computer Science Page 93 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE that satisfy the condition are retrieved in the result of the query. This is illustrated by Query 26. Query 26. For each project on which more than two employees work, retrieve the project number, the project name, and the number of employees who work on the project. Q26: SELECT Pnumber, Pname, COUNT (*) FROM PROJECT, WORKS_ON WHERE Pnumber=Pno GROUP BY Pnumber, Pname HAVING COUNT (*) > 2; Notice that while selection conditions in the WHERE clause limit the tuples to which functions are applied, the HAVING clause serves to choose whole groups. Query 27. For each project, retrieve the project number, the project name, and the number of employees from department 5 who work on the project. Q27: SELECT Pnumber, Pname, COUNT (*) FROM PROJECT, WORKS_ON, EMPLOYEE WHERE Pnumber=Pno AND Ssn=Essn AND Dno=5 GROUP BY Pnumber, Pname; Query 28. For each department that has more than five employees, retrieve the department number and the number of its employees who are making more than $40,000. Q28: SELECT Dnumber, COUNT (*) FROM DEPARTMENT, EMPLOYEE WHERE Dnumber=Dno AND Salary>40000 AND ( SELECT Dno FROM EMPLOYEE GROUP BY Dno HAVING COUNT (*) > 5) Views (Virtual Tables) in SQL 1. Concept of a View in SQL A view in SQL terminology is a single table that is derived from other tables.6 These other tables can be base tables or previously defined views. A view does not necessarily exist in physical form; it is considered to be a virtual table, in contrast to base tables, whose tuples are always physically stored in the database. This limits the possible update operations that can be applied to views, but it does not provide any limitations on querying a view. We can think of a view as a way of specifying a table that we need to reference frequently, even though it may not exist physically. For example, referring to the Dept of Computer Science Page 94 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE COMPANY database in Figure 3.5 we may frequently issue queries that retrieve the employee name and the project names that the employee works on. Rather than having to specify the join of the three tables EMPLOYEE, WORKS_ON, and PROJECT every time we issue this query, we can define a view that is specified as the result of these joins. Then we can issue queries on the view, which are specified as single table retrievals rather than as retrievals involving two joins on three tables. We call the EMPLOYEE,WORKS_ON, and PROJECT tables the defining tables of the view. 2. Specification of Views in SQL In SQL, the command to specify a view is CREATE VIEW. The view is given a (virtual) table name (or view name), a list of attribute names, and a query to specify the contents of the view. If none of the view attributes results from applying functions or arithmetic operations, we do not have to specify new attribute names for the view, since they would be the same as the names of the attributes of the defining tables in the default case. V1: CREATE VIEW WORKS_ON1 AS SELECT Fname, Lname, Pname, Hours FROM EMPLOYEE, PROJECT, WORKS_ON WHERE Ssn=Essn AND Pno=Pnumber; CREATE VIEW DEPT_INFO(Dept_name, No_of_emps, Total_sal) AS SELECT Dname, COUNT (*), SUM (Salary) FROM DEPARTMENT, EMPLOYEE WHERE Dnumber=Dno GROUP BY Dname; In V1, we did not specify any new attribute names for the view WORKS_ON1 (although we could have); in this case,WORKS_ON1 inherits the names of the view attributes from the defining tables EMPLOYEE, PROJECT, and WORKS_ON.View V2 explicitly specifies new attribute names for the view DEPT_INFO, using a one-toone correspondence between the attributes specified in the CREATE VIEW clause and those specified in the SELECT clause of the query that defines the view. We can now specify SQL queries on a view—or virtual table—in the same way we specify queries involving base tables. For example, to retrieve the last name and first name of all employees who work on the ‘ProductX’ project, we can utilize the WORKS_ON1 view and specify the query as in QV1: QV1: SELECT Fname, Lname FROM WORKS_ON1 WHERE Pname=‘ProductX’; V2: Dept of Computer Science Page 95 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE If we do not need a view any more, we can use the DROP VIEW command to dispose of it. For example, to get rid of the view V1, we can use the SQL statement in V1A: V1A: DROP VIEW WORKS_ON1; Schema Change Statements in SQL 1 .The DROP Command The DROP command can be used to drop named schema elements, such as tables, domains, or constraints. One can also drop a schema. For example, if a whole schema is no longer needed, the DROP SCHEMA command can be used. There are two drop behavior options: CASCADE and RESTRICT. For example, to remove the COMPANY database schema and all its tables, domains, and other elements, the CASCADE option is used as follows: DROP SCHEMA COMPANY CASCADE; If the RESTRICT option is chosen in place of CASCADE, the schema is dropped only if it has no elements in it; otherwise, the DROP command will not be executed. To use the RESTRICT option, the user must first individually drop each element in the schema, then drop the schema itself. If a base relation within a schema is no longer needed, the relation and its definition can be deleted by using the DROP TABLE command. DROP TABLE DEPENDENT CASCADE; If the RESTRICT option is chosen instead of CASCADE, a table is dropped only if it is not referenced in any constraints (for example, by foreign key definitions in another relation) or views (see Section 5.3) or by any other elements. With the CASCADE option, all such constraints, views, and other elements that reference the table being dropped are also dropped automatically from the schema, along with the table itself. Notice that the DROP TABLE command not only deletes all the records in the table if successful, but also removes the table definition from the catalog. If it is desired to delete only the records but to leave the table definition for future use, then the DELETE command should be used instead of DROP TABLE. The DROP command can also be used to drop other types of named schema elements, such as constraints or domains. 2. The ALTER Command The definition of a base table or of other named schema elements can be changed by using the ALTER command. For base tables, the possible alter table actions include adding or dropping a column (attribute), changing a column definition, and adding or Dept of Computer Science Page 96 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE dropping table constraints. For example, to add an attribute for keeping track of jobs of employees to the EMPLOYEE base relation in the COMPANY schema, we can use the command ALTER TABLE COMPANY.EMPLOYEE ADD COLUMN JobVARCHAR(12); We must still enter a value for the new attribute Job for each individual EMPLOYEE tuple. This can be done either by specifying a default clause or by using the UPDATE command individually on each tuple (see Section 4.4.3). If no default clause is specified, the new attribute will have NULLs in all the tuples of the relation immediately after the command is executed; hence, the NOT NULL constraint is not allowed in this case. To drop a column, we must choose either CASCADE or RESTRICT for drop behavior. It is also possible to alter a column definition by dropping an existing default clause or by defining a new default clause. The following examples illustrate this clause: ALTER TABLE COMPANY.DEPARTMENT ALTER COLUMN Mgr_ssn DROP DEFAULT; ALTER TABLE COMPANY.DEPARTMENT ALTER COLUMN Mgr_ssn SET DEFAULT ‘333445555’; Additional Features of SQL ■ SQL features: various techniques for specifying complex retrieval queries, including nested queries, aggregate functions, grouping, joined tables, outer joins, and recursive queries; SQL views, triggers, and assertions; and commands for schema modification. ■ SQL has various techniques for writing programs in various programming languages that include SQL statements to access one or more databases. These include embedded (and dynamic) SQL, SQL/CLI (Call Level Interface) and its predecessor ODBC (Open Data Base Connectivity), and SQL/PSM (Persistent Stored Modules). ■ Each commercial RDBMS will have, in addition to the SQL commands, a set of commands for specifying physical database design parameters, file structures for relations, and access paths such as indexes. We called these commands a storage definition language (SDL) Earlier versions of SQL had commands for creating indexes, but these were removed from the language because they were not at the conceptual schema level. Many systems still have the CREATE INDEX commands. ■ SQL has transaction control commands. These are used to specify units of database processing for concurrency control and recovery purposes. ■ SQL has language constructs for specifying the granting and revoking of privileges to users. Privileges typically correspond to the right to use certain SQL commands to Dept of Computer Science Page 97 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE access certain relations. Each relation is assigned an owner, and either the owner or the DBA staff can grant to selected users the privilege to use an SQL statement—such as SELECT INSERT, DELETE, or UPDATE— to access the relation. In addition, the DBA staff can grant the privileges to create schemas, tables, or views to certain users. These SQL commands—called GRANT and REVOKE— ■ SQL has language constructs for creating triggers. These are generally referred to as active database techniques, since they specify actions that are automatically triggered by events such as database updates. ■ SQL has incorporated many features from object-oriented models to have more powerful capabilities, leading to enhanced relational systems known as objectrelational. Capabilities such as creating complex-structured attributes (also called nested relations), specifying abstract data types (called UDTs or user-defined types) for attributes and tables, creating object identifiers for referencing tuples, and specifying operations on types ■ SQL and relational databases can interact with new technologies such as XML and OLAP. Rollback, Commit and Savepoint All these statements fall in the category of Transaction Control Statements. Rollback: This is used for undoing the work done in the current transaction. This command also releases the locks if any hold by the current transaction. The command used in SQL for this is simply: ROLLBACK; Savepoint: This is used for identifying a point in the transaction to which a programmer can later roll back. That is it is possible for the programmer to divide a big transaction into subsections each having a savepoint defined in it. The command used in SQL for this is simply: SAVEPOINT savepointname; For example: UPDATE….. DELETE…. SAVEPOINT e1; INSERT…. UPDATE…. SAVEPOINT e2; …… Dept of Computer Science Page 98 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE It is also possible to define savepoint and rollback together so that programmer can achieve rollback of part o a transaction. Say for instance in the above ROLLBACK TO SAVEPOINT e2; This results in the rollback of all statements after savepoint e2 Commit: This is used to end the transaction and make the changes permanent. When commit is performed all save points are erased and transaction locks are released. In other words commit ends a transaction and marks the beginning of a new transaction. The command used in SQL for this is simply: COMMIT; An Example to Illustrate Granting and Revoking of Privileges Suppose that the DBA creates four accounts—A1, A2, A3, and A4—and wants only A1 to be able to create base relations. To do this, the DBA must issue the following GRANT command in SQL: GRANT CREATETAB TO A1; The CREATETAB (create table) privilege gives account A1 the capability to create new database tables (base relations) and is hence an account privilege. This privilege was part of earlier versions of SQL but is now left to each individual system implementation to define. In SQL2 the same effect can be accomplished by having the DBA issue a CREATE SCHEMA command, as follows: CREATE SCHEMA EXAMPLE AUTHORIZATION A1; User account A1 can now create tables under the schema called EXAMPLE. To continue our example, suppose that A1 creates the two base relations EMPLOYEE and DEPARTMENT shown in Figure 24.1; A1 is then the owner of these two relations and hence has all the relation privileges on each of them. Next, suppose that account A1 wants to grant to account A2 the privilege to insert and delete tuples in both of these relations.However, A1 does not want A2 to be able to propagate these privileges to additional accounts. A1 can issue the following command: GRANT INSERT, DELETE ON EMPLOYEE, DEPARTMENT TO A2; Notice that the owner account A1 of a relation automatically has the GRANT OPTION, allowing it to grant privileges on the relation to other accounts. However, account A2 cannot grant INSERT and DELETE privileges on the EMPLOYEE and DEPARTMENT tables because A2 was not given the GRANT OPTION in the preceding command. Now suppose that A1 decides to revoke the SELECT privilege on the EMPLOYEE relation from A3; A1 then can issue this command: REVOKE SELECT ON EMPLOYEE FROM A3; Dept of Computer Science Page 99 Downloaded by jazz dude (it23092000@gmail.com) lOMoARcPSD|25829564 NIE FIRST GRADE COLLEGE The DBMS must now revoke the SELECT privilege on EMPLOYEE from A3, and it must also automatically revoke the SELECT privilege on EMPLOYEE from A4. This is because A3 granted that privilege to A4, but A3 does not have the privilege any more. Dept of Computer Science Page 100 Downloaded by jazz dude (it23092000@gmail.com)