Overview of a Database Management System Single Boxes:represent system components Double Boxes:represent inmemory data structures Solid Lines :indicate control and data flow Dashed Lines:indicate data flow only Overview of a Database Management System Sources of commands to the DBMS. 1. End Users and application programs that ask for data or modify data 2. A DBA (Database Administrator) a person or persons responsible for the structure or schema of the database. (Schema – the overall description of the database logical structures that is defined by the DDL) Three main components of DBMS 1. Storage Manager - responsible for storing data, metadata, indexes and logs. 2. Query Manager – parses queries, optimized them by selecting query plan, and executes the plan on the stored data 3. Transaction Manager - logging database changes to support recovery after a system crashes. Also support concurrent execution of transactions in: • • Atomicity – a tranx is performed either completely or not at all Isolation – a tranx is executed as if there was no other concurrently executing tranx. Data-Definition Language Commands • DDL (Data Definition Language) – defines the format of each data element in a database. Database tables for eg. Are created and dropped using the DDL • Parsed by a DDL processor and pass to the execute engine • Then goes through the index / file / record manager to alter the metadata, that is the schema information for the database. (metadata – information describing the nature and structure of an organization’s data: data about data) Overview of Query Processing • A user or an application program initiates some action that does not affect the schema of the database, but may affect the content of the database (if the action is a modification command) or will extract data from the database (if action is a query) • These command are expressed in DML (Data Manipulation Language) • ie,SQL is an example of a DML. Overview of Query Processing • DML statements are handled by two separate subsystems:- Answering the Query - Transaction processing Answering the Query 1:- Query is parsed and optimized by a Query Compiler. 2:- The resulting Query Plan or sequence of actions the DBMS will perform to answer the Query is passed to the execution engine. Answering the Query 3:- The execution engine issues a sequence of requests for small pieces of data, typically records or tuples of a relation, to a resource manager that knows about data files (holding relations), that format and size of records in those files, and index files which help find elements of data files quickly. Answering the Query 4:- The requests for data are translated into pages and these requests are passed to the buffer manager. (Task of a buffer manager is to bring appropriate portions of the data from secondary storage (disk) where it kept permanently to main memory buffer. Normally the page or disk block is the unit of transfer between buffer and disk). Answering the Query - The buffer manager communicates with a storage manager to get data from disk - The storage manager might involve OS command - Typically DBMS issues commands directly to the disk controller Transaction Processing • Queries and other DML actions are grouped into transactions, which are units that must be executed atomically and isolation from one another. • Each Query or modification action is a transaction by itself. • Execution of a transactions must be durable. Transaction Processing • Two major parts of a Transaction Processor 1. Concurrency control manager or scheduler – responsible for answering atomicity and isolation of transaction. 2. Logging and recovery manager – responsible for the durability of transactions Transaction Processing • DBMS offer the guarantee of durability • Transaction manager therefore accepts transaction commands from an application which tell the transaction manager when transaction begin and end as well as information about the expectations of the application Transaction Processing • Task that transaction processor performs 1.Logging: -to assure durability, every change in the database is logged separately on disk. Log Manager assure when a system failure occurs, a recovery manager will be able to examine the log of changes and restore the database. Transaction Processing 2. Concurrency control:- - assure that the individual actions of multiple transactions are executed in such an order that the net effect is the same as if the transactions had in fact executed in their entirely, one at a time. Transaction Processing Typical scheduler does its work by maintaining LOCK on certain pieces of the database. These locks prevent two transactions from accessing the same piece of data in ways that interact badly. Lock are generally stored in a main memory lock tables Transaction Processing 3. Deadlock resolution:- A situation where none can proceed because each needs something another transaction has. – Transaction Manager has the responsibility to intervene and cancel (roll back or abort) one or more transactions to let the other proceed. Storage and Buffer Management • Database normally resides in the secondary memory • Data must be in the main memory for any useful operation to be perform • It is the storage manager’s job to control the placement of data on disk and its movement between disk and main memory. Storage and Buffer Management • Simple database – storage manager might be nothing more than the file system of the underlying OS. • For efficiency purpose DBMS normally control storage on disk directly. • Storage manager keeps track of the location of files on the disk and obtains the block or blocks containing a file on request from the buffer manager. Storage and Buffer Management • The buffer manager is responsible for partitioning the available main memory into buffers, which are page-sized regions into which disk blocks can be transferred. Storage and Buffer Management • All DBMS components that need information from disk will interact with the buffers and the buffer manager, either directly or through the execution engine. Storage and Buffer Management • Kind of information that various component may need include: Data :- the contents of the database Metadata :- the database schema that describes the structure of, and constrains on, the database Statistics :- Information gathered and stored by the DBMS about data properties. The Query Processor • The portion of the DBMS that most affects the performance that the user sees is the query processor. • Two component that represent the query processor - Query Compiler - Execution Engine. The Query Processor • Query Compiler -translate the query into an internal form called query plan - then a sequence of operations performed on the data The Query Processor • Query Compiler consists of three major units (i) Query parser :- which build a tree structure from the textual form of the query (ii) Query Processor :- which perform semantic checks on the query and perform some tree transformations to turn the parse tree into a tree of algebraic operators representing the initial query plan. (iii) Query Optimizer :- transforms the initial query plan into the best available sequence of operations on the actual data. The Query Processor • The query compiler uses metadata and statistics about the data to decide which sequence of operations is likely to be the fastest. – For example: the existence of an index, which is a specialized data structure that facilitates access to data, given values for one or more components of that data, can make one plan much faster then another. The Query Processor • Execution Engine :- has the responsibility for executing each of the steps in the chosen query plan The execution engine interacts with most of the other component of the DBMS, either directly or through the buffers. Data must get into the Buffer in order to be manipulate. It needs to interact with the scheduler to avoid accessing data that is locked and with the log manager to make sure that all database changes are properly logged. The Acid Properties of Transactions • Properly implemented transaction are commonly said to meet the “ACID test” where :“A” = atomicity, the all or nothing execution of transactions. “I” = isolation, the fact that each transaction must appear to be executed as if no other transaction is executing at the same time. The Acid Properties of Transactions “D”= durability, the condition that the effect on the database of a transaction must never be lost once the transaction has completed. The remaining “C” stands for consistency. That is, all databases have consistency constraints, or expectations about relationships among data elements (eg. Account balances may not be negative) Transactions are expected to preserver the consistency of the database. Summary • End of lecture Outline of Database System Studies • Ideas related to Database system can be divided into three broad categories. 1. Design of Databases How does one develop a useful database? What kind of information go into database? How is the information structured? What assumption are made about types or values of data items? How do data items connect? Outline of Database System Studies 2. Database Programming. How does one express queries and other operations on the database? How does one use other capabilities of a DBMS such as transactions or constraints in an application? How is database programming combined with conventional programming? Outline of Database System Studies 3. Database System Implementation. How does one build a DBMS, including such matters as query processing and organizing storage for efficient assess? Storage management: how secondary storage is used effectively to hold data and allow it to be accessed quickly. Query processing: how queries expressed in a very highlevel language such as SQL can be executed efficiently. Transaction management: how to support transaction with ACID properties. Outline of Database System Studies 4. Information Integration Much of the recent evolution of database systems has been toward capabilities that allow different data sources, which may be databases and/or information resources that are not managed by a DBMS, to work together in a larger whole. Index • How Indexes are Implemented You can use the Indexed property to set a single-field index. An index speeds up queries on the indexed fields as well as sorting and grouping operations. For example, if you search for specific employee names in a LastName field, you can create an index for this field to speed up the search for a specific name. Setting The Indexed property uses the following settings. Setting No Yes Yes Description (Default) No index. (Duplicates OK) The index allows duplicates. (No Duplicates) The index doesn't allow duplicates. Index • Remarks Use the Indexed property to find and sort records by using a single field in a table. The field can hold either unique or nonunique values. For example, you can create an index on an EmployeeID field in an Employees table in which each employee ID is unique or you can create an index on a Name field in which some names may be duplicates. • Note You can't index Memo, Hyperlink, or OLE Object data type fields. You can create as many indexes as you need. The indexes are created when you save the table and are automatically updated when you change or add records. You can add or delete indexes at any time in table Design view. Summary • Overview of a Database Management System DDL Data Definition Language Command Query Processing Storage and Buffer Management Transaction Processing The Query Processor • Outline of Database-System Studies Design of a Database Database Programming Database System Implementation Information Integration