Desirable features in an information system • • • • • • Integrity Referential integrity Data independence Controlled redundancy Security Privacy File systems • Sequential or serial • Indexed sequential • Relative Database definition • a computerised record-keeping system • used by a range of users who have different requirements – minimal enquiries – in-depth updating – restructuring • A well-implemented database will have data integrity, data independence, controlled redundancy, security and privacy, for all users. Uses of a Database • Generally used for on-line transaction processing (OLTP) • Data Warehouses are a hybrid of databases which are used for On-line analytical processing (OLAP) Structure of a database External Schema Conceptual Schema Internal Schema Physical Schema External level • Level visible to user • Multiple views of the system – e.g. View an order - see limited product and customer information • Only the database Administrator may access the whole database at this level EXTERNAL SCHEMA • Each external view is defined by means of an external schema • Provides definitions of each external view. • Written in a Data Definition Language • individual to the user • accessed through a 3GL, a query language or a special purpose forms or menu-based language Conceptual level • CONCEPTUAL - represents the entire information content of the database • Consists of multiple types of conceptual record. This level preserves the data independence of the database. • CONCEPTUAL SCHEMA - defines each of the various types of conceptual record, in a conceptual Data Definition Language. Internal level • INTERNAL - a low-level representation of the entire database; it consists of multiple occurrences of multiple types of internal record. It is the stored record, inasmuch as it contains all but the device-specific information on the storage of the database. • PHYSICAL - the physical device and block addresses for each of the records. Mappings • Each level maps onto adjoining levels • conceptual / internal mapping specifies how conceptual records and fields are represented at the internal level • Changes can be made in the internal level without affecting the conceptual level • external / conceptual mapping defines the correspondence between an external view and the conceptual view DBMS - Database Management System • software handling access to the database • allows both the database administrator and all users the access to the database to which they are entitled How requests are processed • User issues request (e.g. through SQL) • DBMS intercepts and analyses request • DBMS inspects user's external schema, external to conceptual mapping, conceptual schema, conceptual to internal mapping and the storage structure definition. • DBMS executes operations on stored database. DATABASE ADMINISTRATOR (DBA) • Decide on the storage structure and access strategy • Liaise with the users • Define security and integrity checks • Define a backup and recovery strategy • Monitor and respond to performance Utilities used by the DBA • • • • • • Load routines Dump/Restore routines Reorganisation routines Statistics routines Analysis routines Data dictionary (containing METADATA, which gives data descriptions and mappings) Relational database • Data is independent from programs and from other data • Data is represented in TABLES rather than files. (one entity corresponds to 1 table) • Column headings are described as DOMAINS. (i.e. attributes) • Items of information as TUPLES or ROWS rather than records (i.e. occurrences of the entity) Definitions • A RELATION is a collection of semantically related information, usually containing a unique key. A RELATION = a Table • FOREIGN key - a key to a different relation that is used as non-key data in this relation. (i.e. the enforcing field in the relationship) • SIMPLE key - uses one item from the row • COMPOUND key - uses more than one item / attribute • Unnormalized data - contains headings, footings, differing number of occurrences for different fields. Properties of a relation • Third Normal form (TNF) test. – All row entries are non-divisible (atomic) - i.e. no such thing as arrays – All entries in a particular column are drawn from the same set (i.e. no such thing as redefines) Normalisation of data • Collect all documents to be entered/produced • Represent documents in unnormalized form • Choose and identify key items, giving unnormalized data + keys • Separate out repeating groups -> 1st Normal Form (1NF) • Separate out part key dependencies -> 2nd Normal Form (2NF) • Separate out inter-data and inter-key dependencies -> 3rd Normal Form (TNF) • Apply TNF tests • Optimise by combining relations with identical keys • Apply TNF tests again Relational database • This is a database that is perceived by its users as a collection of tables. Each table can define an ENTITY • Entities can be related through RELATIONSHIPS • Relationships are implemented by use of foreign keys in tables • Each column has a unique name within the table • All rows are distinct (no two are the same) • Row or column order is not significant • Every relation must have a key Operations in SQL • Tables are created by the CREATE TABLE statement: CREATE TABLE DRIVERS (DRIVER_NUMBER SMALLINT NOT 0, DRIVER_NAME CHAR(20), HOME_DEPOT CHAR(6), VEHICLE_TYPE etc... • Tables can be changed: ALTER TABLE DRIVERS ADD OTHER_ALLOWANCES CHAR(6); • and deleted: DROP TABLE DRIVERS; Operations in SQL • Tables can be joined together on fields which have the same attributes: SELECT DRIVER.*, VEHICLE.* FROM DRIVER, VEHICLE WHERE DRIVER.VEHICLE_TYPE = VEHICLE.VEHICLE_TYPE; Implementation of desirable features • Integrity – A field’s validation can be declared when the field is declared. If this validation is used, then the integrity of the field remains intact. – Entity integrity - No attribute participating in the primary key of a base relation is allowed to accept null values. – Domain constraints - what are the possible valid values that can be used? Referential integrity – Through the propagation and use of foreign keys, no detail can be created where a master is needed, nor can a master be deleted without consent to the deletion of the details Implementation of desirable features • Data independence – The implementation of relational databases causes the external and conceptual schema to be data independent. The internal schema and the physical level are data dependent. • Controlled redundancy – The relational model reduces redundancy at the conceptual level SECURITY • Legal, social and ethical considerations (e.g. Data protection act) • Physical controls - locking of computer rooms • Company policy • Operational - e.g. password access rulings • Hardware controls - e.g. privileged operating mode • Limits on fields that users can see Security and SQL • SQL allows views to be created that only allow the view users access to a range or selection of values for particular fields; e.g. CREATE VIEW CORK_DRIVERS AS SELECT DRIVER_NUMBER, DRIVER_NAME, YEARS_SERVICE FROM DRIVERS WHERE HOME_DEPOT = "CORK"; • This is a value-dependent constraint. Security and Privacy in SQL • Different users can be granted different access rights : GRANT SELECT, UPDATE (CREDIT_LIMIT, AMOUNT_OWING) ON TABLE CUSTOMER TO GRP_ACCNTS; GRANT SELECT ON VIEW CUSTOMER_TOTAL TO DEPOT_CONTROLLERS; • • The access types that can be granted are SELECT, UPDATE, DELETE and INSERT. • Access rights can also be REVOKEd. Security and SQL • Field-dependent constraints can be imposed by omitting the field from the view. Views can also be presented so that they give totals only - not individual items: CREATE VIEW CUSTOMER_TOTAL AS SELECT CREDIT_LIMIT, AMOUNT_OWING FROM CUSTOMERS GROUP BY CREDIT_LIMIT JOURNALLING • An audit trail can be set up to follow operations on the database. This involves journalling of each, or a specific type of operation on the database or some part of it. • The audit trail should specify the operation, the terminal from which it was invoked, the user, the date-time, the database, table, record and field affected, the old and new value of the field. • The advantages of this are that it gives the auditors a way of tracing any discrepancies. However, it slows down the operation of the system considerably. BACKUP SECURITY • As well as the fact that the database administrator will ensure that the full database is backed up in a logical way, most databases have the COMMIT/ROLLBACK facility: • Whenever a program updates the database, the update remains tentative only, until a COMMIT causes it to become permanent, or a ROLLBACK • cancels it. ROLLBACK is only issued if an exception occurs Internal level (relational) • Internal schema (some Data Definition Language). Stored_Driver Driver_number Driver_Name Driver_Home_depot Driver_vehicle_type Driver_empl_date Driver_TFA Driver_Tax_Table Length 41. BYTE(6), Offset 0, INDEX. Byte(20), Offset 6. Byte(1), Offset 26. Byte(2), Offset 28. (**) Byte(8), Offset 30. Byte(2), Offset 38. Byte (1), Offset 40. Conceptual schema (some Data Definition Language) Driver. Driver_number Driver_Name Driver_Home_depot Driver_vehicle_type Driver_employment_date Driver_TFA Driver_Tax_Table Character (6). Character (20). Numeric (1). Character (2). Date Numeric 7 digits 2 decimal Character 1. Subschema or External schema (COBOL) 01Driver-pay-table. 02 Driver_no 02 Driver_name 02 Driver_Vehicle_type 02 Driver_TFA 02 Driver_Tax_Table 02 Driver_Employ_date pic x(6). pic x(20). pic xx. pic 9(5)v99. pic A. pic 99/99/9999. External schema 01 Driver_location_table. 02 Driver_no 02 Driver_name 02 Driver_Vehicle_type 02 Driver_Home_depot pic x(6). pic x(20). pic xx. pic 9. Data Warehouse • Definition - a collection of current and historical operational data stored for use in executive support systems (a.k.a. executive information systems EIS) and decision support systems DSS. • Purposes – Growing demand that executives and management have rapid, easy access to operational data for planning and decision making – Diversity of format and location of historical data Storage of non-standard data types • Pictures, Video clips, Sound clips • Can be done on a relational database. These data types are seen conceptually as just another data type. Only data is held on them - i.e. a video clip can be held on a relational database, but separate functionality must be provided to play it - this also applies to sound and still pictures. • Oracle and Informix call these databases “universal” databases. IBM call them “extenders” to DB2. Distributed databases • Databases can now be distributed over different computers and operating systems by the use of middleware – Open DataBase Connectivity (ODBC) • In order for database requests to be passed from one computer to the other, special software is supplied that will translate the client computer’s request into a format understood by the target server computer. The reply is then converted back. This layer of software is called middleware. ODBC • This middleware provides only database connectivity there is a generally accepted ODBC (open database connectivity) standard. This increases scalability. • ODBC connects to relational database management systems, but not to flat files, thereby excluding a lot of legacy systems. • All the major RDBMS vendors are offering software to link their databases to the Web. Primary examples are Oracle’s Network Computing Architecture and Informix’s Universal Web Architecture