INTRODUCTION Basic Terms: Data: • Data is defined as a collection of facts and figures that can be recorded and has implicit meaning. • It is a random, unorganized collection of measurements of certain qualities or attributes relating to an entity. Information: • Information is a data that is collected, processed, logically organized and analyzed as to be used by the decision maker. Data Processing: • Capturing • Verifying • Classifying • Sorting/Arranging • Summarizing • Calculating • Storing • Retrieving • Reproducing • Communication Database: • A database is a collection of data, typically describing the activities of one or more related organizations. • For example, a University Database might contain information about the following: • students, faculty, courses, and classrooms. • Relationships between them is : • students enrollment in courses, • Faculty teaching courses, and • the use of rooms for courses. Database Management System: • DBMS is a collection of programs that enables users to create and maintain a database. • It facilitates the process of: – Defining a Database. – Constructing a Database. – Manipulating a Database. • Defining a Database: – Specifying the structure of a Database. – The name of the entity. – The attributes of the entity. – Data Types. – Size – Constraints – Operations • Constructing a Database: – Storing the database itself on some storage medium. • Manipulating a Database: – Querying – Updating – Generating Reports etc. • Database systems are designed to manage large bodies of information. • Management of data involves both defining structures for storage of information and providing mechanisms for the manipulation of information. • In addition, the database system must ensure the safety of the information stored, despite system crashes or attempts at unauthorized access. • If data are to be shared among several users, the system must avoid possible anomalous results. Properties of Database: 1. A database represents some aspects of the real world. 2. A database is a logically coherent collection of data with some inherent meaning. 3. A random assortment of data cannot be termed as database. 4. A database is designed, built and populated with data for a specific purpose. It has an intended group of users and preconceived applications in which these users are interested. Types of Databases: A) Classification Based on Nature Of Data: 1. Traditional Database: 2. Textual or numeric in nature. Geographic Information System: 3. Maps, weather data, satellite images. Multimedia Database: 4. Pictures, video clips, sound messages. Data Warehouses and Online Analytical Processing: 5. Integration of data and for decision making. Real Time and Active Database: Controlling industrial and manufacturing processes. B) Classification based on Data Model: 1. Hierarchical DB. 2. Network DB. 3. Relational DB. 4. Object DB. C) Classification based on number of Users: 1. Single User Systems. 2. Multi user Systems. D) Classification based on Number of Sites: 1. Centralized DBMS 2. Distributed DBMS E) Classification based on Type of Processing: 1. OLTP Systems 2. OLAP Systems Database Applications: – Banking: customer information, accounts, and loans, and banking transactions. – Airlines: For reservations and schedule information. – Universities: For student information, course registrations, and grades. – Sales: customers, products, purchases – Telecommunications: For keeping records of calls made, generating monthly bills, maintaining balances on prepaid calling cards, and storing information about the communication networks. – Manufacturing: production, inventory, orders, supply chain. – Human resources: employee records, salaries, tax deductions Purpose of Database Systems: • In the early days, database applications were built directly on top of file systems. • What is a File System? • In file system permanent records are stored in various files and different application programs are written to extract records from, and to add records to, the appropriate files. Drawbacks of using file systems to store data: 1. Data redundancy and inconsistency: • Multiple file formats, • Programs in multiple programming languages, • duplication of information in different files, • data inconsistencies. 2. Difficulty in accessing data: • Need to write a new program to carry out each new task. 3. Data isolation: • Data scattered in multiple files. • multiple files and formats. • Difficult to write new application program to retrieve data. 4. Integrity problems: • Consistency Constraints • Integrity constraints (e.g. account balance >1000) become “buried” in program code rather than being stated explicitly • Hard to add new constraints or change existing ones 5. Atomicity of updates: • Failures may leave database in an inconsistent state with partial updates carried out. • Example: Transfer of funds from one account to another should either complete or not happen at all. • Difficult to ensure this property in file system. 6. Concurrent access by multiple users: • Concurrent access needed for performance. • Uncontrolled concurrent accesses can lead to inconsistencies. • Example: Two people reading a balance and updating it at the same time. 7. Security problems: • Hard to provide user access to some, but not all, data Database systems offer solutions to all the above problems. Main Characteristics of Database Technology: 1. Self-Describing nature of a database system: • • Database system stores: – Database itself; – Description of DB structure & constraints. A DBMS catalog stores the description of the database. The description is called meta-data. This allows the DBMS software to work with different databases. 2. Insulation between programs and data: • This is called program-data independence. • Allows changing data storage structures and operations without having to change the DBMS access programs. 3. Data Abstraction: • A data model is used to hide storage details and present the users with a conceptual view of the database. • Conceptual representation does not include details of how data is stored or how operations are implemented. 4. Support of multiple views of the data: • Each user may see a different view of the database, which describes only the data of interest to that user. • A view is a subset of the database or it may contain virtual data that is derived from database but is not explicitly stored. 5. Sharing of data Processing: & Multi-user transaction • OLTP Transactions. • Concurrency Control Mechanism. Additional Benefits of Database Technology: - Controlling redundancy in data storage. - Sharing of data among multiple users. - Restricting unauthorized access to data. - Providing multiple interfaces to different classes of users. - Representing complex relationships among data. - Enforcing integrity constraints on the database. - Providing backup and recovery services. - Potential for enforcing standards. - Flexibility to change data structures. - Reduced application development time. - Availability of up-to-date information. - Economies of scale. Users of Database: 1. Database Administrators: – – – – authorizing access to database, Coordinating & monitoring database use, Acquiring software & hardware resources as needed, Accountable for breach of security or poor system response time. 2. Database Designers: – – – – Identify data to be stored in db, Select appropriate structure for storing data, Communicate with all users and understand requirements, Design a database that meets user requirements. their 3. End Users: 1. Casual End Users: • • • • Occasionally access a db, Needs different information each time, Use sophisticated database query language. E.g. middle or high level managers. 2. Naïve or Parametric End Users: • • • Constantly query and update the database, Uses standard types of Queries called transactions. E.g. Bank tellers, Railway reservation clerks etc. canned 3. Sophisticated End Users: • • Users who use database to meet complex requirements. E.g. Engineers, scientists, business analysts. 4. Stand alone Users: • Use readymade program package to interact with database. 4. System Analyst & Application Programmers: – System analyst determine the requirements of end users; – Application programmers implement these specifications as programs. – Both are called software engineers. 5. DBMS System Designers & Implementers: – Designs & implements DBMS modules and interfaces as a S/W package. 5. Operators & Maintenance Personnel: – Responsible for the actual running and maintenance of H/W & S/W environment for the database system. Architecture: Data Abstraction: • Data abstraction means the details of data storage are hided from the users who do not need them. Levels of Abstraction: 1. Physical level: Describes how a record (e.g., customer) is stored. 2. Logical level: Describes data stored in database, and the relationships among the data. 3. View level: Application programs hide details of data types. Views can also hide information (such as an employee’s salary) for security purposes. Database Schema: • The description of a database is called as database schema. •It is specified during database design. • It includes the descriptions of the database structure and the constraints that should hold on the database. • It is not expected to change frequently. Schema Diagram: • A diagrammatic display of (some aspects of) a database schema is called schema diagram. Database Instance/Database State: •The actual data stored in a database at a particular moment in time is called as database instance , database state or database occurrence. Schemas VS Instances: •The database schema changes very infrequently . The database state changes every time the database is updated . Schema is also called intension, whereas state is called extension. DBMS 3 - Tier Architecture: In this DBMS architecture, schemas can de defined in three levels: 1. Internal level. 2. Conceptual level. 3. External level. 1. Internal Level: • The internal level has an internal schema. • It describes the physical storage structure of the database. • The internal schema uses physical data model, which describes the complete details of : – data storage, – access paths for the database, and – how the data’s are retrieved or inserted in the database. 2. Conceptual Level: • The conceptual level has a conceptual schema. • It describes the whole database for different users who access the database. • The conceptual schema hides the details of the physical storage structures and concentrates basically on entities, relationships, and constraints. 3. External Level: • External level includes a number of external schemas or user views. • Each external schema describes the part of the database that a particular user group is interested in and hides the rest of the database from that user groups. • The three schemas are only descriptions of data. • The data actually exists is at the physical level. • DBMS transforms users request specified on an external schema into a request against the conceptual schema and then into a request against internal schema for processing over the database. • The request and retrieval must be reformatted to match user’s external view. • The process of transforming requests and results between levels are called mappings. Data Independence: • Data independence is the defined as the capacity to change the schema at one level of database system without having to change the schema at next higher level. • There are two types of data independence: 1. Logical Data Independence: 2. Physical Data Independence: 1. Logical Data Independence: • The capacity to change the conceptual schema without having to change the external schemas and their application programs. • Conceptual schema is changed to: – expand the Database (Adding a record type or data item) – Reduce the Database(Removing a record type or data item) • View definition and mappings need to be changed. • Application programs that reference the external schema constructs must work as before. 2. Physical Data Independence: • The capacity to change the internal schema without having to change the conceptual schema. • Changes are needed because some physical files had to be reorganized. • E.g.: Creation of additional access structure to improve the performance of retrieval & update. • If the same data as before remains in the database conceptual schema need not be changed. Database Languages: • Data Definition Language DDL. • Storage Definition Language SDL • View Definition Language VDL • Data Manipulation Language DML – High level or nonprocedural DML. (set at a time) – Low level or Procedural DML. (Record at a time) DBMS Interfaces: 1. Menu Based interfaces for browsing: – Users are provided with a list of options (menus). – Query is composed by picking up the options. – No need to memorize the syntax of queries. 2. Forms based interfaces: – A Form is displayed to each user. – Data entry, retrieval is done through forms. – Used by Naïve/parametric users. – DBMS Supports for form specification language. 3. Graphical user interfaces: – A schema is displayed to users in diagrammatic form. – A query is constructed by manipulating a diagram. – GUI utilizes menus & forms. 4. Natural language interfaces: – These interfaces accepts requests written in English and tries to understand it. – Natural language interfaces has its own schema similar to DB conceptual schema. – Mapping is done. 5. Interfaces for parametric users: – Parametric users(bank tellers) has a small set of operations that are repeatedly performed. – A small set of abbreviated commands is included. – E.g. Function keys can be programmed to initiate various commands. 6. Interfaces for the DBA: – Privileged commands are created. – These include commands for creating accounts, setting system parameters, granting account authorization, changing a schema, reorganizing storage structure. Query processor: • It handles high-level queries. • It parses, validates, optimizes, and compiles or interprets a query which results in the query plan. Run-time DB processor: • It handles database accesses at run-time by receiving retrieval or update operations and carries them out on the database. DDL compiler: • It processes schema definitions (DDL statements) and stores descriptions of the schemas (metadata) on the DBMS catalog. Transaction Manager: • With the cooperation of the concurrency and recovery subsystems, this module ensures that the transactions which are constituted of queries and other actions are executed atomically, in consistency, in isolation and in durability. Concurrency subsystem: • Assures that individual actions of multiple transactions are executed in such an order that the result is the same as if the transactions had executed entirely at one time. Recovery subsystem: • Responsible to store every change to the database separately on disk (I.e. log file) . Such information will be used to restore the system to a consistent state after a any failure has occurred. Security & authorization subsystem: • Responsible to protect the database against unauthorized accesses. File manager: • File manager controls accesses to DBMS information that is stored on the disk and can use OS’s File manager services. Buffer manager: • Manages the main memory buffers which store data, metadata, indexes, statistics, and the log. Recall that all this information must be in main memory (i.e. in buffers) before it can be used. Note: Buffer Manager & File Manager together are often called Stored Data Manager. Examples of DBMS modules interactions: • Suppose a user requests to update a record of a relation (see Figure) • The DML statement is issued to the query processor (1) which parses it, checks if the user has the privileges to do such operation using the metadata provided from the catalog information found in the main memory buffers (2) and finally issues the query plan which is the optimized way to execute the statement (3). • The run-time DB processor takes the query plan and executes it by writing the update in the buffer if the record is there (4). Otherwise, it requests the buffer manager to get the record (5). • The buffer Manager requests the corresponding block from the File manager (6) who gets it from the disk (7), gives it back to the buffer manager who puts it in the buffer (8). • The run-time DB processor can then execute the statement. • Before executing the statement, the run-time DB processors informs the concurrency subsystem (9) in order to check concurrent accesses for no inconsistency and the recovery subsystem to take the necessary actions to allow a recovery in case of failure (10). System Architectures for DBMS We distinguish between two different DBMS architectures: – Centralized DBMS architecture –for earlier systems – Client-Server DBMS architecture –for current systems Centralized DBMS Architecture: • Earlier computer system architectures were based on mainframe computers. • Mainframe computers provide the main processing of all system • functions. • Most users accessed such systems via computer terminals that provide display capabilities with no processing power. • Hence, in the centralized DBMS architecture, all functions of the system, including user application programs, user interface programs, as well as all the DBMS functionality are executed in the mainframe computer. Client-Server Architecture: • Client-server computer systems architectures emerged with new computing environments. • Large number of PCs, workstations, specialized servers and another equipments are connected together via a network. • A client-Server architecture contains: – Specialized servers with specific functionalities providing resources • File server, printer server, web server, E-mail server, etc. – Client machines provide users with appropriate interface to utilize these servers as well as with local processing power to run local applications