Truba Institute of Engineering & Information Technology Bhopal BE-205 UNIT V Data base Management System : Introduction File oriented approach and Database approach Data Models Architecture of Database System Data independence Data dictionary DBA Primary Key Data definition language and Manipulation Languages Submitted by: Sugan Patel Computer Science & Engineering Department Page 1 Truba Institute of Engineering & Information Technology Bhopal BE-205 1.Introduction A database is a collection of stored operational data used by various applications and/or users by some particular enterprise or by a set of outside authorized applications and authorized users. Database Management System: A Database Management System (DBMS) is a software system that manages execution of users applications to access and modify database data so that the data security, data integrity, and data reliability is guaranteed for each application and each application is written with an assumption that it is the only application active in the database. Data Different viewpoints: –A sequence of characters stored in computer memory or storage –Interpreted sequence of characters stored in computer memory or storage –Interpreted set of objects – Database supports a concurrent access to the data File Systems: •File is uninterrupted, unstructured collection of information •File operations: delete, catalog, create, rename, open, close, read, write, find, … •Access methods: Algorithms to implement operations along with internal file organization •Examples: File of Customers, File of Students; Access method: implementation of a set of operations on a file of students or customers. File Management System Problems: •Data redundancy Submitted by: Sugan Patel Computer Science & Engineering Department Page 2 Truba Institute of Engineering & Information Technology Bhopal BE-205 •Data Access: New request-new program •Data is not isolated from the access implementation •Concurrent program execution on the same file •Difficulties with security enforcement •Integrity issues. Database Applications: •Airline Reservation Systems – Data items are: single passenger reservations; Information about flights and airports; Information about ticket prices and tickets restrictions. •Banking Systems – Data items are accounts, customers, loans, mortgages, balances, etc. Failures are not tolerable. Concurrent access must be provided. •Corporate Records – Data items are: sales, accounts, bill of materials records, employee and their dependents ADVANTAGES OF A DBMS: Data independence: Application programs should be as independent as possible from details of data representation and storage. The DBMS can provide an abstract view of the data to insulate application code from such details. Client data access: A DBMS utilizes a variety of sophisticated techniques to store and retrieve data efficiently. This feature is especially important if the data is stored on external storage devices. Submitted by: Sugan Patel Computer Science & Engineering Department Page 3 Truba Institute of Engineering & Information Technology Bhopal BE-205 Data integrity and security: If data is always accessed through the DBMS, the DBMS can enforce integrity constraints on the data. For example, before inserting salary information for an employee, the DBMS can check that the department budget is not exceeded. Also, the DBMS can enforce access controls that govern what data is visible to different classes of users. Data administration: When several users share the data, centralizing the administration of data cant improvements. Experienced professionals, who understand the nature of the data being managed, and how different groups of users use it, can be responsible for organizing the data representation to minimize redundancy and for ne-tuning the storage of the data to make retrieval efficient. Concurrency recovery: A DBMS schedules concurrent accesses to the data in such a manner that users can think of the data as being accessed by only one user at a time. Further, the DBMS protects users from the system failures. Reduced application development time: Clearly, the DBMS supports many important functions that are common to many applications accessing data stored in the DBMS. This, in conjunction with the highlevel interface to the data, facilitates quick development of applications. Such applications are also likely to be more robust than applications developed from scratch because many important tasks are handled by the DBMS instead of being implemented by the application. Submitted by: Sugan Patel Computer Science & Engineering Department Page 4 Truba Institute of Engineering & Information Technology Bhopal BE-205 Data Levels and their Roles Physical – corresponds to the first view of data: How data is stored, how is it accessed, how data is modified, is data ordered, how data is allocated to computer memory and/or peripheral devices, how data items are actually represented (ASCI, EBCDIC,…) .The physical schema species additional storage details. Essentially, the physical schema summarizes how the relations described in the conceptual schema are actually stored on secondary storage devices such as disks and tapes. We must decide what le organizations to use to store the relations, and create auxiliary data structures called indexes to speed up data retrieval operations. Conceptual – corresponds to the second view of data: What we want the data to express and what relationships between data we must express, what “ story” data tells, are all data necessary for the “story’ are discussed. The conceptual schema (sometimes called the logical schema) describes the stored data in terms of the data model of the DBMS. In a relational DBMS, the conceptual schema describes all relations that are stored in the database. In our sample university database, these relations contain information about entities, such as students and faculty, and about relationships, such as students' enrollment in courses. All student entities can be described using Submitted by: Sugan Patel Computer Science & Engineering Department Page 5 Truba Institute of Engineering & Information Technology Bhopal BE-205 records in a Students relation, as we saw earlier. In fact, each collection of entities and each collection of relationships can be described as a relation, leading to the following conceptual schema: Students(sid: string, name: string, login: string, age: integer, gpa: real) Faculty( d: string, fname: string, sal: real) Courses(cid: string, cname: string, credits: integer) Rooms(rno: integer, address: string, capacity: integer) Enrolled(sid: string, cid: string, grade: string) Teaches( d: string, cid: string) Meets In(cid: string, rno: integer, time: string) The choice of relations, and the choice of elds for each relation, is not always obvious, and the process of arriving at a good conceptual schema is called conceptual database design. View – corresponds to the third view of data: What part of the data is seen by a specific application? External schemas, which usually are also in terms of the data model of the DBMS, allow data access to be customized (and authorized) at the level of individual users or groups of users. The external schema design is guided by end user requirements. For example, we might an to allow students to nd out the names of faculty members teaching courses, as well as course enrollments. Course info (cid: string, fname: string, enrollment: integer) 3. DATA MODEL: E-R modeling is a conceptual level model Entities are real-world objects about which we collect data Attributes describe the entities Relationships are associations among entities Entity set – set of entities of the same type Relationship set – set of relationships of same type Relationships sets may have descriptive attributes Represented by E-R diagrams Submitted by: Sugan Patel Computer Science & Engineering Department Page 6 Truba Institute of Engineering & Information Technology Bhopal BE-205 Submitted by: Sugan Patel Computer Science & Engineering Department Page 7 Truba Institute of Engineering & Information Technology Bhopal BE-205 Object-oriented Model Uses the E-R modeling as a basis but extended to include encapsulation, inheritance Objects have both state and behavior State is defined by attributes Behavior is defined by methods (functions or procedures) Designer defines classes with attributes, methods, and relationships Class constructor method creates object instances Each object has a unique object ID Classes related by class hierarchies Database objects have persistence Submitted by: Sugan Patel Computer Science & Engineering Department Page 8 Truba Institute of Engineering & Information Technology Bhopal BE-205 Both conceptual-level and logical-level model The Hierarchical Data Model The Hierarchical Data Model structures data in a tree of records, with each record having one parent record and many children. It can be represented as follows: Submitted by: Sugan Patel Computer Science & Engineering Department Page 9 Truba Institute of Engineering & Information Technology Bhopal BE-205 A hierarchical database consists of the following: 1. 2. 3. 4. 5. 6. 7. It contains nodes connected by branches. The top node is called the root. If multiple nodes appear at the top level, the nodes are called root segments. The parent of node nx is a node directly above nx and connected to nx by a branch. Each node (with the exception of the root) has exactly one parent. The child of node nx is the node directly below nx and connected to nx by a branch. One parent may have many children. By introducing data redundancy, complex network structures can also be represented as hierarchical databases. This redundancy is eliminated in physical implementation by including a 'logical child'. The logical child contains no data but uses a set of pointers to direct the database management system to the physical child in which the data is actually stored. Associated with a logical child are a physical parent and a logical parent. The logical parent provides an alternative (and possibly more efficient) path to retrieve logical child information. The Network Data Model The Network Data Model uses a lattice structure in which a record can have many parents as well as many children. It can be represented as follows: Submitted by: Sugan Patel Computer Science & Engineering Department Page 10 Truba Institute of Engineering & Information Technology Bhopal BE-205 Like the The Hierarchical Data Model the Network Data Model also consists of nodes and branches, but a child may have multiple parents within the network structure instead of being restricted to just one. Relational Model Record- and table-based model Relational database modeling is a logical-level model Proposed by E.F. Codd Based on mathematical relations Uses relations, represented as tables Columns of tables represent attributes Tables represent relationships as well as entities Successor to earlier record-based models—network and hierarchical Submitted by: Sugan Patel Computer Science & Engineering Department Page 11 Truba Institute of Engineering & Information Technology Bhopal BE-205 Submitted by: Sugan Patel Computer Science & Engineering Department Page 12 Truba Institute of Engineering & Information Technology Bhopal BE-205 4. DBMS ARCHITECTURE DBMS (Database Management System) acts as an interface between the user and the database. The user requests the DBMS to perform various operations (insert, delete, update and retrieval) on the database. The components of DBMS perform these requested operations on the database and provide necessary data to the users. The various components of DBMS are shown below: - 1. DDL Compiler - Data Description Language compiler processes schema definitions specified in the DDL. It includes metadata information such as the name of the files, data items, storage details of each file, mapping information and constraints etc. 2. DML Compiler and Query optimizer - The DML commands such as insert, update, delete, retrieve from the application program are sent to the DML compiler for compilation into object code for database access. The object code is then optimized in the best way to execute a query by the query optimizer and then send to the data manager. Submitted by: Sugan Patel Computer Science & Engineering Department Page 13 Truba Institute of Engineering & Information Technology Bhopal BE-205 3. Data Manager - The Data Manager is the central software component of the DBMS also knows as Database Control System. The Main Functions Of Data Manager Are: – • Convert operations in user's Queries coming from the application programs or combination of DML Compiler and Query optimizer which is known as Query Processor from user's logical view to physical file system. • Controls DBMS information access that is stored on disk. • It also controls handling buffers in main memory. • It also enforces constraints to maintain consistency and integrity of the data. • It also synchronizes the simultaneous operations performed by the concurrent users. • It also controls the backup and recovery operations. 5. Data Independence Data Independence means that the higher levels of the database model are designed to be unaffected by changes to the lower levels (internal and physical). There are two types of Data Independence. - Logical Physical data data independence independence Logical Data Independence involves the external schema being unaffected by changes in the conceptual schema. For example, a new field can be added to a table (relation) without any changes to application programs etc... being required. Submitted by: Sugan Patel Computer Science & Engineering Department Page 14 Truba Institute of Engineering & Information Technology Bhopal BE-205 Physical Data Independence means that the conceptual schema is not affected by changes made to the internal schema. An example of a change to the internal schema would be changing the storage device used to store the database data. This would not affect the conceptual or external schemas / layers. 6.Data Dictionary – Data Dictionary is a repository of description of data in the database. It Contains information about • Data - names of the tables, names of attributes of each table, length of attributes, and number of rows in each table. • Relationships between database transactions and data items referenced by them which are useful in determining which transactions are affected when certain data definitions are changed. • Constraints on data i.e. range of values permitted. Submitted by: Sugan Patel Computer Science & Engineering Department Page 15 Truba Institute of Engineering & Information Technology Bhopal BE-205 • Detailed information on physical database design such as storage structure, access paths, files and record sizes. • Access Authorization - is the Description of database users their responsibilities and their access rights. • Usage statistics such as frequency of query and transactions. Data dictionary is used to actually control the data integrity, database operation and accuracy. It may be used as an important part of the DBMS. Importance of Data Dictionary – Data Dictionary is necessary in the databases due to following reasons: • It improves the control of DBA over the information system and user's understanding of use of the system. • It helps in documentation the database design process by storing documentation of the result of every design phase and design decisions. • It helps in searching the views on the database definitions of those views. • It provides great assistance in producing a report of which data elements (i.e. data values) are used in all the programs. • It promotes data independence i.e. by addition or modifications of structures in the database application program are not affected. Submitted by: Sugan Patel Computer Science & Engineering Department Page 16 Truba Institute of Engineering & Information Technology Bhopal BE-205 7.Database Administrator The database administrator (DBA) is the person (or group of people) responsible for overall control of the database system. The DBA's responsibilities include the following: Deciding the information content of the database, i.e. identifying the entities of interest to the enterprise and the information to be recorded about those entities. This is defined by writing the conceptual schema using the DDL Deciding the storage structure and access strategy, i.e. how the data is to be represented by writing the storage structure definition. The associated internal/conceptual schema must also be specified using the DDL liaising with users, i.e. to ensure that the data they require is available and to write the necessary external schemas and conceptual/external mapping (again using DDL) Defining authorization checks and validation procedures. Authorization checks and validation procedures are extensions to the conceptual schema and can be specified using the DDL Defining a strategy for backup and recovery. For example periodic dumping of the database to a backup tape and procedures for reloading the database for backup. Use of a log file where each log record contains the values for database items before and after a change and can be used for recovery purposes monitoring performance and responding to changes in requirements, i.e. changing details of storage and access thereby organizing the system so as to get the performance that is `best for the enterprise' Data Redundancy In non-database systems each application has its own private files. This can often lead to redundancy in stored data, with resultant waste in storage space. In a database the data is integrated. The database may be thought of as a unification of several otherwise distinct data files, with any redundancy among those files partially or wholly eliminated. Data integration is generally regarded as an important characteristic of a database. The avoidance of redundancy should be an aim, however, the vigour with which this aim should be pursued is open to question. Redundancy is direct if a value is a copy of another indirect if the value can be derived from other values: o simplifies retrieval but complicates update o conversely integration makes retrieval slow and updates easier Data redundancy can lead to inconsistency in the database unless controlled. Submitted by: Sugan Patel Computer Science & Engineering Department Page 17 Truba Institute of Engineering & Information Technology Bhopal BE-205 The system should be aware of any data duplication - the system is responsible for ensuring updates are carried out correctly. a DB with uncontrolled redundancy can be in an inconsistent state - it can supply incorrect or conflicting information A given fact represented by a single entry cannot result in inconsistency - few systems are capable of propagating updates i.e. most systems do not support controlled redundancy. Data Integrity This describes the problem of ensuring that the data in the database is accurate... Inconsistencies between two entries representing the same `fact' give an example of lack of integrity (caused by redundancy in the database). Integrity constraints can be viewed as a set of assertions to be obeyed when updating a DB to preserve an error-free state. Even if redundancy is eliminated, the DB may still contain incorrect data. Integrity checks which are important are checks on data items and record types. 9.DDL Data Definition Language (DDL) statements are used to define the database structure or schema. Some examples: o o o o o o CREATE - to create objects in the database ALTER - alters the structure of the database DROP - delete objects from the database TRUNCATE - remove all records from a table, including all spaces allocated for the records are removed COMMENT - add comments to the data dictionary RENAME - rename an object DML Data Manipulation Language (DML) statements are used for managing data within schema objects. Some examples: o o o o o o o SELECT - retrieve data from the a database INSERT - insert data into a table UPDATE - updates existing data within a table DELETE - deletes all records from a table, the space for the records remain MERGE - UPSERT operation (insert or update) CALL - call a PL/SQL or Java subprogram EXPLAIN PLAN - explain access path to data Submitted by: Sugan Patel Computer Science & Engineering Department Page 18 Truba Institute of Engineering & Information Technology Bhopal BE-205 o LOCK TABLE - control concurrency DCL Data Control Language (DCL) statements. Some examples: o o GRANT - gives user's access privileges to database REVOKE - withdraw access privileges given with the GRANT command TCL Transaction Control (TCL) statements are used to manage the changes made by DML statements. It allows statements to be grouped together into logical transactions. o o o o COMMIT - save work done SAVEPOINT - identify a point in a transaction to which you can later roll back ROLLBACK - restore database to original since the last COMMIT SET TRANSACTION - Change transaction options like isolation level and what rollback segment to use Submitted by: Sugan Patel Computer Science & Engineering Department Page 19 Truba Institute of Engineering & Information Technology Bhopal BE-205 Submitted by: Sugan Patel Computer Science & Engineering Department Page 20