MIS 335 - Database Systems http://www.mis.boun.edu.tr/durahim/ Ahmet Onur Durahim Learning Objectives • Database systems concepts • Designing and implementing a database application • Life of a Query in a Database System Database Systems • Data? – Streams of raw facts representing events occurring in organizations • Information? – Data shaped into a meaningful form that is useful to human beings • Database – Organized collection of data Information vs. Data • Raw data is processes and organized to produce meaningful and useful information – Total unit sales, total sales revenue Database Management System (DBMS) • DBMS contains information about a particular enterprise – Collection of interrelated data – Set of programs to access the data – An environment that is both convenient and efficient to use • Database Applications: – – – – Banking: all transactions Airlines: reservations, schedules Universities: registration, grades Sales: customers, products, purchases Transaction Processing Systems • Serves operational managers and staff – Keep track of the elementary activities and transactions of the organization (sales, cash deposits, flow of materials) – Monitor the status of internal operations and the firm’s relations with the external environment • Perform and record daily routine transactions necessary to conduct business – Sales order entry, payroll, and shipping • Serve predefined, structured goals, tasks and resources at the operational level • Major producers of information for the other systems and business functions Transaction Processing Systems Data Management Systems • Responsible for organizing and managing firm’s data so that they can be efficiently accessed and used • Allows the definition, creation, querying, update, and administration of databases (Transactions, Persistence of data, Recovery, Concurrency control) Database Software Providers – Commercial • Microsoft SQL Server, Oracle, IBM DB2, Sybase • These 4 companies supply more than 90% of US DB market – Open Source • MySQL, PostgreSQL Data Management and Storage • NoSQL and Big Data Systems – Amazon Dynamo, Cassandra, MongoDB, Neo4j – Apache Hadoop Why do we need DBMSs? • Computers were originally developed for number crunching • By time, data storage and processing became as important as scientific computing • Amount and types of data increased – Image/audio/video data – Genome data – Customer transactions • DBMSs were developed to manage this data Why do we need DBMSs? • • • • • • • Data independence Efficient access Data integrity and security Uniform data administration Concurrent access Recovery from crashes User-friendly declarative query language File Systems vs Databases Terminology and Basic Ideas • Data Model: Describes conceptual structuring of data stored in database – data model is set of records. (records might each have studentID and name) • The relational model of data is the most widely used model today – Main concept: relation, basically a table with rows and columns – Every relation has a schema, which describes the columns, or fields • Schema vs. Data – Schema describes how data is to be structured • defined at set-up time, rarely changes • part of the "metadata" – Data is actual "instance" of database, may change rapidly Relational Model Relational Model Concepts Terminology and Basic Ideas • Data Definition Language (DDL) – Commands for setting up schema of database – Process of designing schema can be complex • may use design methodology and/or tool • Data Manipulation Language (DML) – Commands to manipulate data in database: RETRIEVE, INSERT, DELETE, MODIFY – Also called "query language" Database Design • The process of designing the general structure of the database – Logical Design: Decide on the “good” collection of the relation schemas – Logical modeling deals with gathering business requirements and converting those requirements into a model • Business decision: What attributes should we record in the database? • IS decision: What relation schemas should we have and how should the attributes be distributed among the various relation schemas? Database Design • The process of designing the general structure of the database – Physical Design: Decide on the physical layout of the database – Physical modeling deals with the conversion of the logical, or business model, into a relational database model – database software specific • The objects defined during physical modeling can vary depending on the relational database software being used. • Most relational database systems have variations with the way data types are represented and the way data is stored, although basic data types are conceptually the same among different implementations. Logical vs Physical Modeling Structured Query Language (SQL) • Widely used non-procedural database query language • Question: What is the name, email address and yearly income of the customer with customerKey ‘11009’? Customer Table DimCustomer Components of a DBMS • Database system has five main pieces; – Client Communications Manager that manages communication between the users and the database – Process manager that encapsulates and schedules the various tasks in the system – A statement-at-a time query processing engine – A shared transactional storage subsystem that knits together storage, buffer management, concurrency control and recovery – A set of shared utilities including memory management, disk space management, replication, and various batch utilities used for administration Architectural Components of a DBMS Life of a Query • A database interaction at an airport, in which a gate agent clicks on a form to request the passenger list for a flight • This button click results in a single-query transaction Life of a Query • The personal computer at the airport gate (the “client”) calls an API that in turn communicates over a network to establish a connection with the Client Communications Manager of a DBMS – the responsibility of the DBMS’ client communications manager • to establish and remember the connection state for the caller • to respond to SQL commands from the caller • to return both data and control messages (result codes, errors, etc.) as appropriate • In gate agent’s query example, the communications manager would – establish the security credentials of the client – set up state to remember the details of the new connection and the current SQL command across calls – forward the client’s first request deeper into the DBMS to be processed Life of a Query • Upon receiving the client’s first SQL command, the DBMS (Process Manager) must assign a “thread of computation” to the command – Make decision regarding admission control: whether the system should begin processing the query immediately, or defer execution until a time when enough system resources are available to devote to this query • Begin executing the gate agent’s query by Relational Query Processor. – checks that the user is authorized to run the query, and compiles the user’s SQL query text into an internal query plan – The plan executor consists of a suite of “operators” (relational algorithm implementations) for executing any query • Operators implement relational query processing tasks including joins, selection, projection, aggregation, sorting, etc. Life of a Query • One or more operators exist to request data from the database. These operators make calls to fetch data from the DBMS’ Transactional Storage Manager – Manages all data access (read) and manipulation (create, update, delete) calls – Invoke the transaction management code to ensure the well-known “ACID” properties of transactions – Before accessing data; • locks are acquired from a lock manager to ensure correct execution in the face of other concurrent queries. • If the gate agent’s query involved updates to the database, it would interact with the log manager to ensure that the transaction was durable if committed, and fully undone if aborted Life of a Query • Agent’s query has begun to access data records, and is ready to use them to compute results for the client. – done by “unwinding the stack” of activities • The access methods return control to the query executor’s operators – which orchestrate the computation of result tuples from database data • as result tuples are generated, they are placed in a buffer for the client communications manager – which ships the results back to the caller. • At the end of the query: the transaction is completed and the connection closed – the transaction manager cleaning up state for the transaction – the process manager freeing any control structures for the query – the communications manager cleaning up communication state for the connection Life of a Query • Shared components and utilities that are vital to the operation of a full-function DBMS. • The catalog and memory managers are invoked as utilities during any transaction. – The catalog is used by the query processor during authentication, parsing, and query optimization. – The memory manager is used throughout the DBMS whenever memory needs to be dynamically allocated or deallocated. • Remaining modules that run independently of any particular query, keeping the database as a whole well tuned and reliable