1. Introduction Syllabus 2. ER Model 3. Relational Model 4. SQL 5. Integrity and Security 6. Relational Database Design 7. File Structure, Indexing and Hashing 8. Transaction 9. Concurrency control 10. Recovery System Chapter 1: Introduction • What is Database? • Purpose of Database Systems • View of Data • Data Models • Data Definition Language • Data Manipulation Language • Transaction Management • Storage Management • Database Administrator • Database Users • Overall System Structure Definition of Database • A shared collection of logically related data, designed to meet the information needs of multiple users in an organization. • The term database is often erroneously referred to as a synonym for a “DataBase Management System (DBMS)" Contd… • A collection of data: part numbers, product codes, customer information, etc. It usually refers to data organized and stored on a computer that can be searched and retrieved by a computer program. • A data structure that stores metadata, i.e. data about data. More generally we can say an organized collection of information. • A collection of information organized and presented to serve a specific purpose. (A telephone book is a common database.) A computerized database is an updated, organized file of machine readable information that is rapidly searched and retrieved by computer. • An organized collection of information in computerized format. • A collection of related information about a subject organized in a useful manner that provides a base or foundation for procedures such as retrieving information, drawing conclusions, and making decisions. Example Thing Data (Facts/Figures) Cricket Player Country, name, date of birth, specialty, matches played, runs etc. Scholars Name, data of birth, age, country, field, books published etc. Food Name, ingredients, taste, preferred time, origin, etc. Vehicle Registration number, make, owner, type, price, etc. Purpose / Need of Database Systems Let us discuss an example Example: Personal Calendar • We might start by building a file with the following structure: What Day When Who Where Lunch 10/24 CS123 10/25 Biking 10/26 Dinner 10/26 1pm 9am 9am 6PM Joe’s Diner Morris234 Jane’s house Café Le Boeuf Rick Dr. Egghead Jane Jane • This text file is easy to deal with. So there's no need for a DBMS! Problem 1: Data Organization • Consider the all-important ``who'' field. Do we also want to keep e-mail addresses, telephone numbers etc? • Expand our file to look like: What When Who-name Who-email … Who-tel …. Where • Now we are keeping our address book in our calendar and doing so redundantly. “Link” Calendar with Address Book? • Two conceptual “entities” -- contact information and calendar -- with a relationship between them, linking people in the calendar to their contact information. • This link could be based on something as simple as the person's name. Problem 2: Efficiency • Size of personal address book is probably less than one hundred entries, but there are things we'd like to do quickly and efficiently. – “Give me all appointments on 10/28” – “When am I next meeting Jim?” • “Program” these as quickly as possible. • Have these programs executed efficiently. • What would happen if you were using a “corporate” calendar with hundreds of thousands of entries? Problem 3. Concurrency and Reliability • Suppose other people are allowed access to your calendar and are allowed to modify it? How do we stop two people changing the file at the same time and leaving it in a physical (or logical) mess? • Suppose the system crashes while we are changing the calendar. How do we recover our work? Example Suppose a manager schedule a meeting with his staff today (3:00pm) and at the same time his secretary schedules him to meet with the Chairman. They both see that the time is open, but presumably only one of the two meetings will show on the calendar later. What is a DBMS? • A database (DB) is a large, integrated collection of data. • A DB models a real-world enterprise / DBMS contains information about a particular enterprise. • A database management system (DBMS) is a software package designed to store and manage databases / set of programs to access the data. • DBMS provides an environment that is simultaneously convenient, secure and efficient to use. • Is the software or tool that is used to manage the database and its users. • A DBMS consist of different components or subsystem. • Each subsystem or component of the DBMS performs different function(s). • So a DBMS is collection of different programs but they all work jointly to manage the data stored in the database and its users. • Database is collection of data, DBMS is tool to manage this data, and both jointly are called database system. What the DBMS is about • • • • • Organization of data Efficient retrieval of data Reliable storage of data Maintaining consistent data All these topics are interrelated. Drawbacks of file systems • In the early days, database applications were built directly on top of file systems • Drawbacks of using file systems to store data: – Data redundancy and inconsistency • Multiple file formats, duplication of information in different files – Difficulty in accessing data • Need to write a new program to carry out each new task – Data isolation — multiple files and formats Drawbacks of file systems (Cont.) – Integrity problems • Integrity constraints (e.g. account balance > 0) become “buried” in program code rather than being stated explicitly • Hard to add new constraints or change existing ones – Atomicity of updates • Failures may leave database in an inconsistent state with partial updates carried out • Example: Transfer of funds from one account to another should either complete or not happen at all Drawbacks of file systems (Cont.) • Concurrent access by multiple users • Concurrent accessed needed for performance • Uncontrolled concurrent accesses can lead to inconsistencies – Example: Two people reading a balance and updating it at the same time • Security problems • Hard to provide user access to some, but not all, data • Database systems offer solutions to all the above problems Database Applications – – – – – Banking: all business transactions Airlines: reservations, schedules Universities: registration, grades Sales: customers, products, purchases Manufacturing: production, inventory, orders, supply chain – Human resources: employee records, salaries, tax deductions Data and Information • Data is the collection of raw facts collected from any specific environment for a specific purpose. • Data in itself does not show anything about its environment. • So to get desired types of results from the data we transform it into information by applying certain processing on it. • Once we have processed data using different methods data is converted into meaningful form and that form of the Data is called information Levels of Abstraction DBMS users are not computer trained, developers hide the complexity from users thro’ levels of abstraction, to simplify user’s interactions with the system • Physical level: lowest level describes how a record (e.g., customer) is stored. • Logical level: next higher level describes what data stored in database, and the relationships among the data. type customer = record customer_id : string; customer_name : string; customer_street : string; customer_city : integer; end; • View level: highest level describes only part of the entire database. DBMS may provide many views for the same database. View of Data An architecture for a database system Data Abstraction What data users and application programs see ? View Level View 1 What data is stored ? describe data properties such as data semantics, data relationships How data is actually stored ? e.g. are we using disks ? Which file system ? View 2 Logical Level Physical Level … View n Instances and Schemas Similar to types and variables in programming languages • Schema – the logical structure / overall design of the database – Example: The database consists of information about a set of customers and accounts and the relationship between them – Analogous to type information of a variable in a program – Physical schema: database design at the physical level – Logical schema: database design at the logical level • Instance – the actual content of the database at a particular point in time – Analogous to the value of a variable Data Independence • Physical Data Independence – the ability to modify the physical schema without changing the logical schema – Applications depend on the logical schema – In general, the interfaces between the various levels and components should be well defined so that changes in some parts do not seriously influence others. • Logical Data Independence – the ability to modify the logical schema without causing application programs to be rewritten. It is difficult to achieve since application programs are heavily dependent on logical structure of data that they access. Database Language Language for accessing and manipulating the data organized by the appropriate data model • One to specify database schema, storage structure and access methods (DDL) and • other to express database queries and updates (DML) Data Definition Language (DDL) • Specification notation for defining the database schema • DDL compiler generates a set of tables stored in a data dictionary, a file that contains metadata, i.e. data about data. Data dictionary is consulted before reading or modifying the actual data Data Manipulation Language (DML) • Language for accessing and manipulating the data organized by the appropriate data model – DML also known as query language • Two classes of languages – Procedural – user specifies what data is required and how to get those data – Declarative (nonprocedural) – user specifies what data is required without specifying how to get those data • SQL is the most widely used query language Data Modeling • A data model is a collection of concepts for describing data properties and domain knowledge: – – – – Data relationships Data semantics Data constraints Relational Model • Only one abstract concept • Closer to the physical representation on disk • Normalization • Entity-Relationship data model (mainly for database design) • Relational model • Object-based data models (Objectoriented and Object-relational) • Semistructured data model (XML) • Other older models: – Network model – Hierarchical model Entity-Relationship Model • Models an enterprise as a collection of entities and relationships – Entity: a “thing” or “object” in the enterprise that is distinguishable from other objects • Described by a set of attributes – Relationship: an association among several entities e.g. each employee is an entity described by empno, empname, designation etc. – Entity-relationship Model • • • • Diagrammatic representation Easier to work with Syntax not important, but remember the “meaning” Remember what you can model • Entity set – set of all entities of the same type • Relationship set – set of all relationships of same type • Mapping cardinality – number of entities to which another entity can be associated via relationship set • The overall logical design of database can be expressed graphically by an E-R diagram, which has following components: • Rectangles - entity set • Ellipses –attributes of an entity • Diamonds –relationship among entity sets • Lines – link attributes to entity sets and entity sets to relationship sets Each component of E-R diagram is labeled with entity or relationship that it represents Example of schema in the entity-relationship model Relational Model • It uses collection of tables to represent data as well as relationship among those data. • Each table has multiple columns, each column has unique name. Other Models Network Model • Data are represented by collection of records, and relationships among those data are represented by links, which are viewed as pointers. • Records are organized as a collection of arbitrary graph Other Models Cont. Hierarchical Model • Data are represented by collection of records, and relationships among those data are represented by links, which are viewed as pointers. • Records are organized as a collection of trees rather than arbitrary graph Database Users Users are differentiated by the way they expect to interact with the system •End Users access to the database for querying, updating, and generating reports Casual end users: occasionally access the database need different information each time learn only a few facilities that they may use repeatedly. use a sophisticated database query language to specify their requests typically middle- or high-level managers or other occasional browsers • Application programmers – interact with system through DML calls • Sophisticated users – form requests in a database query language Database Users • Specialized users – write specialized database applications that do not fit into the traditional data processing framework • Naïve users – invoke one of the permanent application programs that have been written previously – E.g. people accessing database over the web, bank tellers, clerical staff Database Users • System Analysts and Application Programmers – Determine the requirements of end users, especially naive and parametric end users, and develop specifications for canned transactions that meet these requirements – Application programmers implement these specifications as programs; then they test, debug, document, and maintain these canned transactions • Workers behind the Scene – Typically do not use the database for their own purposes – DBMS system designers and implementers – design and implement the DBMS modules (for implementing the catalog, query language, interface processors, data access, concurrency control, recovery, and security. ) and interfaces as a software package Database Users • Tool developers – Tools are optional packages that are often purchased separately – include packages for database design, performance monitoring, natural language or graphical interfaces, prototyping, simulation, and test data generation. • Operators and maintenance personnel – system administration personnel who are responsible for the actual running and maintenance of the hardware and software environment for the database system Database Administrator • Coordinates all the activities of the database system; the database administrator has a good understanding of the enterprise’s information resources and needs. • Database administrator's duties include: – – – – – – Schema definition Storage structure and access method definition Schema and physical organization modification Granting user authority to access the database Specifying integrity constraints Monitoring performance and responding to changes in requirements Database Actors • Database Administrators – In a database environment, the primary resource is the database itself and the secondary resource is the DBMS and related software – authorizing access to the database – coordinating and monitoring its use – acquiring software and hardware resources as needed Database Actors • Database Designers – identifying the data to be stored in the database – choosing appropriate structures to represent and store this data undertaken before the database is actually implemented and populated with data – communicate with all prospective database users, in order to understand their requirements – develop a view of the database that meets the data and processing requirements for each group of users – These views are then analyzed and integrated with the views of other user groups. The final database design must be capable of supporting the requirements of all user groups Overall Database System Structure Query Processor Components • DML Compiler – translates DML statements in a query language into low level instructions • DDL interpreter – interprets DDL statements and records them in a set of tables containing metadata. • Query evaluation engine – executes low-level instructions generated by DML compiler Storage Manager Components • Authorization & integrity manager – tests for satisfaction of integrity constraints and checks the authority of user to access the data • Transaction manager – ensures the consistency of the database despite system failures, and concurrent transaction executions proceed with conflicting • File Manager – Manages the allocation of space on disk storage and the data structures used to represent information on disk • Buffer Manager – responsible for fetching data from disk storage into main memory, and deciding what data to catch in memory Data structures are required as a part of physical implementation • Data files – stores database itself • Data dictionary - stores meta data about the structure of database • Indices – which provides fast access to data items that hold particular values • Statistical data – which stores statistical information about the data in the database, used by strategy selector DATABASE SYSTEM ARCHITECTURE Storage Management • Storage manager is a program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system. • The storage manager is responsible to the following tasks: – Interaction with the file manager – Efficient storing, retrieving and updating of data • Issues: – Storage access – File organization – Indexing and hashing Transaction Management • A transaction is a collection of operations that performs a single logical function in a database application • Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures. • Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database.