1 Week 14 November 28 • Database Security • Transaction Management and Concurrency Controls • Distributed Database • Data Warehouse, Data Marts and MMDBMS • OODBMS R. Ching, Ph.D. • MIS • California State University, Sacramento Database Security 2 • The protection of the database against threats using both technical and administrative controls • Database security aims to minimize losses caused by anticipated events in a cost-effective manner without unduly constraining the users Threats: Organization Policy Theft and fraud Loss of confidentiality Controls (objectives for system) Loss of privacy Loss of integrity Database Security Loss of availability Organizational Resource R. Ching, Ph.D. • MIS • California State University, Sacramento Threats • Any situation or event, whether intentional or unintentional, that will adversely affect a system and consequently the organization. – Tangible losses (hardware, software, data) – Intangible losses (credibility, confidentiality) Countermeasures and Contingency Plans R. Ching, Ph.D. • MIS • California State University, Sacramento 3 Threats and Countermeasures • Initiate countermeasures to overcome threats – Consider the types of threat and their impact on the organization • Cost-effectiveness • Frequency • Severity R. Ching, Ph.D. • MIS • California State University, Sacramento 4 Threats and Countermeasures • Objective is to achieve a balance between a reasonable secure operation, which does not unduly hinder users, and the costs of maintaining it. Secured Costs Operations Risks • Risks are independent of the countermeasures R. Ching, Ph.D. • MIS • California State University, Sacramento 5 Countermeasures 6 • Computer-based vs. Non-computer-based Implemented through the operating system and/or DBMS R. Ching, Ph.D. • MIS • California State University, Sacramento Management policies and procedures Computer-Based Controls • Computer-based controls – Authorization – Views – Backup (and recovery) – Journaling – Checkpointing – Integrity – Encryption – Associated procedures R. Ching, Ph.D. • MIS • California State University, Sacramento 7 Computer-based Control: Authorization or Access Controls • Granting privileges which enables users and applications to legitimately have access to a system or object (table, view, application, procedure, etc.) – Authentication ensures the user is who s/he claims her/himself to be • Layers of access or penetration into a system – Ownership and privileges • Access to database(s) • Manipulation and definition of data R. Ching, Ph.D. • MIS • California State University, Sacramento 8 Authorization and Authentication O/S User Operating System DBMS User DBMS Table Grants Database Database Table Objects and Privileges R. Ching, Ph.D. • MIS • California State University, Sacramento 9 Computer-based Control: Views 10 • Virtual relation to support a user’s particular needs – Restricts access and actions – Created upon demand of the user Base Relations Virtual relation R. Ching, Ph.D. • MIS • California State University, Sacramento Computer-based Control: Tables 11 SQL> grant select, update, delete on comp_products to scott; Grant succeeded. Privilege SQL> revoke delete on comp_products from scott; Revoke succeeded. Table GRANT privilege ON table TO user; REVOKE privilege ON table FROM user; R. Ching, Ph.D. • MIS • California State University, Sacramento User name Transaction Management What is a “transaction?” • An action or series of actions, carried out by a single user or application program which reads or updates (changes) the contents of the database – Retrievals – Updates (modifications) – Insertions – Deletions R. Ching, Ph.D. • MIS • California State University, Sacramento 12 What is a “transaction?” • Characteristics – Atomicity (entirety of action) – Consistency (from one consistent state to another) – Isolation (independent of other transactions) – Durability (permanence) R. Ching, Ph.D. • MIS • California State University, Sacramento 13 Transaction Management • Provide a means for maintaining the integrity of the database • Importance: – In a multi-user environment, the order of transactions actions must be maintained through concurrency control – In the event of a failure or destruction of data, data must be reconstructed through database recover Data Integrity R. Ching, Ph.D. • MIS • California State University, Sacramento 14 Concurrency Control • The process of managing simultaneous operations on the database without having them interfere with one another • Potential problems: – Lost update problem (one update overrides another) – Uncommitted dependency problem (intermediate results of one update viewed by another before it has been committed) – Inconsistent analysis problem (data retrieved by one user updated by another before the end of the retrievals) – Nonrepeatable read (retrieval results cannot be repeated) R. Ching, Ph.D. • MIS • California State University, Sacramento 15 Concurrency Control 16 • Serializability - scheduling transactions to maximize concurrency and parallelism, yet preventing them from interfering with one another and maintaining consistency – Serial schedule - non-interleaved transactions T1 T2 T3 ... Tn – Nonserial schedule - interleaved transactions T1 T3 ... Tn T2 T4 T5 T6 ... Tn+1 Conflict R. Ching, Ph.D. • MIS • California State University, Sacramento Scheduler must resolve conflict Concurrency Control: Locking and Timestamping • Locking – Prevents simultaneous access or update of the same data • Timestamping – Ordering (prioritizing) transactions by their timestamp R. Ching, Ph.D. • MIS • California State University, Sacramento 17 Concurrency Controls: Locking • Locking methods – lock denies other users from accessing the data while user accessing them – Shared vs. exclusive lock – “Deadly embrace” or deadlock – when a user has a lock on one data item and awaits another, and a second user awaits the data item locked by the first user and has a lock on the data item sought by the first Account balance (locked) Credit limit (waiting) Credit limit (locked) Account balance (waiting) R. Ching, Ph.D. • MIS • California State University, Sacramento 18 Concurrency Controls: Timestamping • All transactions assigned a timestamp (unique identifier that indicates its relative starting time) • Smaller (older) timestamps are given priority • Conflicts resolved through rollbacks and restarts – Transaction rolled back (to its beginning) and restarted (reassigned a newer timestamp) R. Ching, Ph.D. • MIS • California State University, Sacramento 19 Timestamping • Problems – A younger transaction writes a data item before an older transaction accesses it – An older transaction needs to write a data item already accessed by a younger transaction – An older transaction needs to write a data item already written by a younger transaction • Resolved through roll backs and restarts R. Ching, Ph.D. • MIS • California State University, Sacramento 20 Distributed Databases 21 • Distributed database: A logically interrelated collection of shared data, physically distributed over a computer network Network Transparency DDBMS Global Data Dictionary DDBMS Global Data Dictionary Local DBMS Local DBMS Geographically Distributed Database Database Site 1 Site 1 Site 1 Site 3 DDBMS DDBMS – software system that permits the management of the distributed database and makes the distribution transparent to the user. R. Ching, Ph.D. • MIS • California State University, Sacramento Global Data Dictionary Local DBMS Database Site 1 Site 2 Heterogeneous vs. Homogenous DDBMS Architecture DDBMS Global Data Dictionary Data Communications Global external schema Global conceptual schema Local DBMS Database 22 DDBMS Global Data Dictionary Local DBMS Local external schema Local conceptual schema Local internal schema Site 1 R. Ching, Ph.D. • MIS • California State University, Sacramento Database Site 2 Data Allocation • Centralized • Partitioned (fragmented) – Vertical (by columns) – Horizontal (by rows) – Mixed (by columns and rows) • Complete replication • Selective replication (hybrid) – Combination of partitioning, replication and centralization R. Ching, Ph.D. • MIS • California State University, Sacramento 23 Distributed Advantages to Distributing • • • • • • • • • Reflects organizational (distributed) structure Improved shareability and local autonomy Improved availability Improved reliability Improved performance Economics Modular growth Integration Remaining competitive R. Ching, Ph.D. • MIS • California State University, Sacramento 24 Disadvantages to Distributing • • • • • • • Complexity Cost Security Integrity control more difficult Lack of standards Lack of experience Database design more complex R. Ching, Ph.D. • MIS • California State University, Sacramento 25 Considerations for Fragmenting • Usage – Fragmenting by subsets • Efficiency – Store data where they are used most frequently • Parallelism – Parallel execution of a query (divided into subqueries) simultaneously • Security – Store data away from site that do not require them R. Ching, Ph.D. • MIS • California State University, Sacramento 26 Disadvantages to Fragmenting • Performance – Increased retrieval time • Integrity – Difficult to maintain across multiple sites – What happens when two users need to update the same data? R. Ching, Ph.D. • MIS • California State University, Sacramento 27 Transparency 28 • Distribution – Users perceive the database as a single logical entity • Fragmentation transparency The user should NOT be • Location transparency aware of where the data • Replication transparency reside or are allocated • Local mapping transparency • Transaction – All distributed transactions maintain the distributed database’s integrity and consistency • Concurrency transparency • Failure transparency R. Ching, Ph.D. • MIS • California State University, Sacramento Transparency • Performance – DDBMS must perform as if it were a centralized DBMS • DBMS – Hides the knowledge that the local DBMS may be different (applicable to heterogeneous DDBMS) R. Ching, Ph.D. • MIS • California State University, Sacramento 29 R. Ching, Ph.D. • MIS • California State University, Sacramento Low Infrequent Required Accuracy Frequency of Use High Very frequent Quite old Currency Highly current Future Aggregate Time Horizon Information requirements change between levels of management Historical Scope Well defined Operational Control Level of Aggregation Source Management Control Internal Strategic Planning Detailed Wide Information Requirements External Robert Anthony’s Taxonomy of Managerial Information Requirements 30 R. Ching, Ph.D. • MIS • California State University, Sacramento Low Infrequent Required Accuracy Frequency of Use High Very frequent Quite old Currency Highly current Future Aggregate Time Horizon •Relational (Oracle, DB2, SQL7 •Hierarchical (IMS) •Network (Image) Historical Operational Control Level of Aggregation Scope Well defined Transactionbased databases Source Management Control Internal Strategic Planning Detailed Wide Information Requirements External Robert Anthony’s Taxonomy of Managerial Information Requirements 31 Data Warehousing 32 • A subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process. Ad hoc queries External data Time-variant Internal data (within the organization) Tools Decision-making Information Summarized data Competitive or Strategic Advantage R. Ching, Ph.D. • MIS • California State University, Sacramento •Report generators •EIS •OLAP •Data mining Data Warehousing Characteristics • Subject-oriented - Organized around the major business subjects or entities, such as customers, order or products • Integrated - Operational (internal) data and external data are integrated into the data warehouse to provide a single unified database for decision support • Time-variant - Use time stamps to represent historical data. Data warehouses consist of a long series of snapshots, each of which represents operational data captured at a point in time • Nonvolatile - New data are appended, rather than replaced, so that historical data are preserved R. Ching, Ph.D. • MIS • California State University, Sacramento 33 Data Warehouse 34 Warehouse Manager External sources Load Manager Upflow Meta-flow Lightly summarized data Metadata Detailed data Query Manager Inflow Highly summarized data Outflow Outflow Outflow Warehouse Manager Downflow Archive/backup data R. Ching, Ph.D. • MIS • California State University, Sacramento End-user tools Data Warehouse Data Mart 35 • A subset of a data warehouse that supports the requirements of a particular department or business function Summarized data Oracle9i Relational database Extraction Summarized Oracle Express data Multidimension database R. Ching, Ph.D. • MIS • California State University, Sacramento End-user tools: • Reporting • EIS • OLAP • Data mining Implementation 36 • Build data warehouse first • Build data marts first • Build both in parallel Architecture Data Warehouse Developed and implemented in parallel Data Marts R. Ching, Ph.D. • MIS • California State University, Sacramento Multi-dimensional Database (MDDBMS) Products Geographic locations Time is an implied dimension R. Ching, Ph.D. • MIS • California State University, Sacramento Sales medium (e.g., retail, Internet, mail order) 37 Multi-dimensional Database (MDDBMS) 38 For example… Computers Products Printers Scanners Retail Mail Internet Sales medium Cameras Geographic locations R. Ching, Ph.D. • MIS • California State University, Sacramento Multi-dimensional Database (MDDBMS) Working with Two Dimensions 39 Internet Q1 ‘95 April Electronics ‘96 Total Revenue Q2 May ‘97 Mail Order Audio Receivers Speakers ‘98 Q3 ‘99 Q4 June Retail Repeated for each quarter Repeated for each medium Repeated for each year R. Ching, Ph.D. • MIS • California State University, Sacramento Speakers CD/DVD Visual Entertainment Multi-dimensional Database (MDDBMS) Working with Three Dimensions Internet Q1 ‘95 40 USA Electronics ‘96 Total Revenue Q2 N. America ‘97 Mail Order Europe ‘98 ‘99 Q4 Receivers Speakers Q3 Aisa Audio Retail Speakers CD/DVD Visual Entertainment R. Ching, Ph.D. • MIS • California State University, Sacramento Time dimension 41 Retail sales dimension Dimensions R. Ching, Ph.D. • MIS • California State University, Sacramento Oracle Express Distribution channels dimension 42 Retail sales dimension R. Ching, Ph.D. • MIS • California State University, Sacramento Data Warehousing Configuration: Star Schema Which sales mode is becoming more effective for certain products in particular regions? Dimension Table (Sales medium) Dimension Table Fact Table (Product line) Which sales staff produced the highest level of sales for a particular product line in California? 43 What products sold well in different regions of the country through e-commerce (list by quarters)? Dimension Table (Geographic divisions) Dimension Table (Sales staff) What this the growth rate for the past 5 years in retail sales of a particular product line by region? Time is an implied dimension R. Ching, Ph.D. • MIS • California State University, Sacramento OODBMS 44 OID Message Data VS. Entities R. Ching, Ph.D. • MIS • California State University, Sacramento Object-Oriented Concepts Methods (function) determine the behavior of the object Message • External call to the object • Activates a method OID Data 45 Object Identifier • System generated • Unique • Invariant • Independent of attribute values • Invisible to the user Attributes or instance variables • Simple • Complex • Reference R. Ching, Ph.D. • MIS • California State University, Sacramento Relational vs. Object-Relational Relational Table Built-in Data Types 46 Relational View Tables Object Table Views Built-in Data Types Object Views Abstract Data Types Object Tables David A. Anstey, 1997 R. Ching, Ph.D. • MIS • California State University, Sacramento Data Types • Built-in – Character (char, varchar2) – Number (integer, decimal, number) – Date – Raw and long raw – RowID – LOB (CLOB, BLOB) R. Ching, Ph.D. • MIS • California State University, Sacramento 47 ADTs (Abstract Data Types) 48 • User-defined data types • Composed of simple or built-in data types • Types: object types and collection (aggregate) types Object type Table ADT Built-in Built-in R. Ching, Ph.D. • MIS • California State University, Sacramento Built-in New Data Type: VARRAY 49 • Single dimension arrays of fixed lengths SQL> create or replace type contact_addresses as varray(4) of varchar2(30); 2 / Type created. SQL> create or replace type contact_zip_codes as varray(4) of char(8); 2 / Type created. R. Ching, Ph.D. • MIS • California State University, Sacramento Object Types • Three components: – Name - unique identifier of the object – Attributes - describes the object through built-in and abstract data types – Method - dictates the behavior of the object SQL> create type students as object 2 (student_ID char(9), 3 student_information personal_information); 4 / Type created. R. Ching, Ph.D. • MIS • California State University, Sacramento 50 SQL> create or replace type contact_addresses as varray(4) of varchar2(30); 2 / 51 Type created. SQL> create or replace type contact_zip_codes as varray(4) of char(8); 2 / Type created. Embedding a user-defined data type SQL> create or replace type personal_information as object 2 (first_name varchar2(20), 3 middle_name varchar2(20), 4 last_name varchar2(30), 5 address contact_addresses, 6 zip_code contact_zip_codes); 7 / Type created. Data name Data type R. Ching, Ph.D. • MIS • California State University, Sacramento ADT SQL> create or replace type personal_information as object 2 (first_name varchar2(20), 3 middle_name varchar2(20), 4 last_name varchar2(30), 5 address contact_addresses, 6 zip_code contact_zip_codes); 7 / Type created. Embedding an ADT SQL> create table employees 2 (employee_id char(6) primary key, 3 employee_address personal_information); Table created. ADT (user-defined) SQL> create table vendors 2 (vendor_id char(5) primary key, 3 employee_address personal_information); Table created. R. Ching, Ph.D. • MIS • California State University, Sacramento 52 Table with ADT 53 SQL> describe employees; Name -------------------------------EMPLOYEE_ID EMPLOYEE_ADDRESS Null? Type -------- ---------------------NOT NULL CHAR(6) PERSONAL_INFORMATION SQL> describe vendors; Name -------------------------------VENDOR_ID EMPLOYEE_ADDRESS Null? Type -------- ---------------------NOT NULL CHAR(5) PERSONAL_INFORMATION Employee_address Vendor_ID (Employee_ID) R. Ching, Ph.D. • MIS • California State University, Sacramento ADTs Creating an Object Table SQL> 2 3 4 5 54 create or replace type personnel as object (employee_id char(7), manager personal_information, rank varchar2(5)); / Type created. ADT SQL> create table managers of personnel; Table created. SQL> describe managers; Name Null? ------------------------------- -------EMPLOYEE_ID MANAGER RANK R. Ching, Ph.D. • MIS • California State University, Sacramento Type --------------------CHAR(7) PERSONAL_INFORMATION VARCHAR2(5) Creating an Object Table SQL> 2 3 4 5 55 create or replace type personnel as object (employee_id char(7), manager personal_information, ADT rank varchar2(5)); / SQL> describe personal_information; Type created. Name ADT Null? Type --------------- ------- --------------------FIRST_NAME SQL> create table managers of personnel; VARCHAR2(20) MIDDLE_NAME VARCHAR2(20) LAST_NAME VARCHAR2(30) Table created. ADDRESS CONTACT_ADDRESSES ZIP_CODE CONTACT_ZIP_CODES SQL> describe managers; Name Null? ------------------------------- -------EMPLOYEE_ID MANAGER RANK R. Ching, Ph.D. • MIS • California State University, Sacramento Type --------------------CHAR(7) PERSONAL_INFORMATION VARCHAR2(5) Object Reusability 56 • Create a second table using PERSONNEL SQL> create table executives of personnel; Table created. SQL> describe executives; Name Null? Type -------------------------------- -------- ---------------------EMPLOYEE_ID CHAR(7) MANAGER PERSONAL_INFORMATION RANK VARCHAR2(5) R. Ching, Ph.D. • MIS • California State University, Sacramento Object Tables Executives Object type ADT 57 Tables Managers Personnel Employee_ID Personal Information Contact_addresses ADT R. Ching, Ph.D. • MIS • California State University, Sacramento Contact_zip_codes Built-in Data Type Methods Map method: SQL> create or replace type transactions 2 (trans_id number, 3 trans_date date) 4 map member function get_date 5 return date is Function 6 begin 7 select sysdate from dual; 8 end; 9 ); 10 / R. Ching, Ph.D. • MIS • California State University, Sacramento 58 59 R. Ching, Ph.D. • MIS • California State University, Sacramento