CHAPTER 1 Databases and Database Users Introduction Traditional database applications: The information is stored and accessed is either textual or numeric. New applications: Multimedia databases: store pictures, video, … Geographic information systems (CIS) can store and analyze maps, weather data, … Data warehouses is used in many companies to extract and analyze information. Real-time & active database is used in controlling industrial and manufacturing processes. Basic Definitions Database: A collection of related data. Data: Known facts that can be recorded and have an implicit meaning. Example: Names, telephone numbers, addresses, … Mini-world: Some part of the real world about which data is stored in a database. Example: Student grades and transcripts at a university. Basic Definitions Database Management System (DBMS): A software package/ system to facilitate the creation and maintenance of a computerized database. Database System: The DBMS software together with the data itself. Example of a Database Example of a Database Mini-world for the example: Part of a UNIVERSITY environment. Some mini-world entities: - STUDENTs - COURSEs - SECTIONs (of COURSEs) - (academic) DEPARTMENTs - INSTRUCTORs Example of a Database Some mini-world relationships: - SECTIONs are of specific COURSEs - STUDENTs take SECTIONs - COURSEs have prerequisite COURSEs - INSTRUCTORs teach SECTIONs - COURSEs are offered by DEPARTMENTs - STUDENTs major in DEPARTMENTs Example of a Database Typical DBMS Functionality Define a database: in terms of data types, structures and constraints. Construct or Load the Database on a secondary storage medium. Manipulating the database : querying, generating reports, insertions, deletions and modifications to its content. Concurrent Processing and Sharing by a set of users and programs. Protection or Security measures to prevent unauthorized access. Main Characteristics of the Database Approach File Processing systems: Drawbacks of using file systems Data redundancy and inconsistency (mâu thuẩn) Difficulty in accessing data Integrity problems Security problems Concurrent access by multiple users… Main Characteristics of the Database Approach Database Approach: Self-describing nature of a database system: A DBMS catalog stores the description of the database. The description is called meta-data. This allows the DBMS software to work with different databases. Insulation between programs and data: Called program-data independence. Allows changing data storage structures and operations without having to change the DBMS access programs. Main Characteristics of the Database Approach Data Abstraction: A data model is used to hide storage details and present the users with a conceptual view of the database. Support of multiple views of the data: Each user may see a different view of the database, which describes only the data of interest to that user. Sharing of Data & Multi-user Transaction processing: allowing a set of concurrent users to retrieve and to update the database. Database Users Actors on the scene: Users may be divided into those who actually use and control the content called “Actors on the Scene”. Database administrators(DBAs): Responsible for managing the database system, authorizing access, coordinating & monitoring uses. Database designers: Responsible for designing the database, identifying the data to be stored, choosing the structures to represent and store this data. Database Users End Users: The persons that use the database for querying, updating, generating, reports, etc. Casual end users: Occasional users.(middle- or high-level managers). Parametric end users: They use preprogrammed canned transactions to interact continuously with the database. Sophisticated end users: Use full DBMS capabilities for implementing complex applications. Database Users Stand-alone users (personal databases) System Analysts/Application programmers: Design and implement canned transactions for parametric users. Workers behind the scene DBMS designers and implementers: Design and implement the DBMS software package itself. Tool developers: Design and implement tools that facilitate the use of the DBMS software. Tools include design tools, performance tools, special interfaces. Operators and maintenance personnel: Work on running and maintaining the hardware and software environment for the database system Advantages of Using the Database Approach Controlling redundancy in data storage and in development and maintenance efforts. Sharing of data among multiple users. Advantages of Using the Database Approach Restricting unauthorized access to data. Providing persistent storage for program Objects. Providing Storage Structures for efficient Query Processing Providing backup and recovery services. Providing multiple interfaces to different classes of users. Representing complex relationships among data. Enforcing integrity constraints on the database. Drawing Inferences and Actions using rules When not to use a DBMS Main inhibitors (costs) of using a DBMS: High initial investment and possible need for additional hardware. Overhead for providing generality, security, concurrency control, recovery, and integrity functions. When a DBMS may be unnecessary: If the database and applications are simple, well defined, and not expected to change. If there are stringent real-time requirements that may not be met because of DBMS overhead. If access to data by multiple users is not required. Chapter 2 Database System Concepts and Architecture Data Models A set of concepts to describe the structure of a database, and certain constraints. A set of basic operations: retrievals and updates on the database. The dynamic aspect or behavior : allows the database designer to specify a set of valid userdefined operations. Categories of data models High-level (conceptual data model): Provide concepts the way many users perceive data. Entity: real world object or concept to be represented in database. Attribute: some property of the entity. Relationship: represents and interaction among entities. Low-level (Physical data models): Provide concepts that describe details of how data is stored in the computer. Categories of data models Representational (Implementation data models): Provide concepts that fall between the above two, balancing user views with some computer storage details. Include: Relational Data model Network model Hierarchical model Schemas Is the description of the database (not database itself) Specified during database design Not expected to change frequently A displayed schema is called a schema diagram Schema diagram represents only some aspects of a schema (name of record type, data element and some type of constraint) Example: Each object in the schema-such as STUDENT or COURSE-is a schema construct. Schemas Schemas versus Instances Database schema: description of the data (meta- data). Includes descriptions of the database structure and the constraints. Database Instance: The actual data stored in a database at a particular moment in time. Also called database state (or occurrence). Difference between schema and state: At design time, schema is defined and state is the empty state. State changes each time data is inserted or updated, schema remains the same. Three-Schema Architecture Internal schema at the internal level to describe physical storage structures and access paths. Typically uses a physical data model. Conceptual schema at the conceptual level to describe the structure and constraints for the whole database for a community of users. Uses a conceptual or an implementation data model. External schemas at the external level to describe the various user views. Usually uses the same data model as the conceptual level. Three-Schema Architecture Mappings among schema levels are needed to transform requests and data. Programs refer to an external schema, and are mapped by the DBMS to the internal schema for execution. Data Independence Logical Data Independence: The capacity to change the conceptual schema without having to change the external schemas and their application programs. Physical Data Independence: The capacity to change the internal schema without having to change the conceptual schema Data Independence When a schema at a lower level is changed, only the mappings between this schema and higherlevel schemas need to be changed in a DBMS that fully supports data independence. The higher-level schemas themselves are unchanged. Hence, the application programs need not be changed since they refer to the external schemas. DBMS Languages Data Definition Language (DDL): Used by the DBA and database designers to specify the conceptual schema of a database. Data Manipulation Language (DML): Used to specify database retrievals and updates. High Level or Non-procedural Languages: SQL, are set-oriented and specify what data to retrieve than how to retrieve. Also called declarative languages. Low Level or Procedural Languages: record-ata-time; they specify how to retrieve data and include constructs such as looping. Centralized Architectures Centralized DBMS: combines everything into single system including: DBMS software, hardware, application programs and user interface processing software. Client-Server Architectures The client-server architecture: was developed to deal with computing environments in which a large number of workstations, file servers, printers, database servers, Web servers, and other equipment are connected via a network. Client-Server Architectures Clients: Provide appropriate interfaces and a client- version of the system to access the server resources. Clients maybe PCs or Workstations with disks with only the client software installed. Connected to the servers via some form of a network (LAN, wireless network, etc.) DBMS Server: Provides database query and transaction services to the clients Sometimes called query and transaction servers Two Tier Client-Server Architecture User Interface Programs and Application Programs run on the client side. Interface called ODBC (Open Database Connectivity) provides an Application program interface (API) allow client side programs to call the DBMS. Most DBMS vendors provide ODBC drivers. • A client program may connect to several DBMSs. • Other variations of clients are possible: in some DBMSs, more functionality is transferred to clients including data dictionary functions, optimization and recovery across multiple servers,… In such situations the server may be called the Data Server. Two Tier Client-Server Architecture Three Tier Client-Server Architecture Web applications use an architecture three-tier architecture. An intermediate layer between the client and the database server, Application Server or Web Server Three Tier Client-Server Architecture Stores the web connectivity software and the rules and business logic (constraints) part of the application used to access the right amount of data from the database server. Acts like a conduit for sending partially processed data between the database server and the client. Additional Features- Security: Encrypt the data at the server before transmission. Decrypt data at the client. Classification of DBMSs Based on the data model used: Traditional: Relational, Network, Hierarchical. Emerging: Object-oriented, Object-relational. Other classifications: Single-user (typically used with microcomputers) vs. multi-user (most DBMSs). Centralized (uses a single computer with one database) vs. distributed (uses multiple computers, multiple databases) Classification of DBMSs Distributed Database Systems have now come to be known as client server based database systems because they do not support a totally distributed environment, but rather a set of database servers supporting a set of clients.