Database Systems Notes: INFS1603 - Ben Munns

INFS1603 Notes Ben Munns Chapter 1: Database Systems 1.2 Data vs. Information Data Meaningful facts concerning things such as people, places, events or concepts Raw bits and bytes that do not yet have meaning Must be properly formatted for storage, processing, and presentation Information Data that has been processed and presented in a form of human interpretation, often with the purpose of revealing trends of patterns Refined, with context, makes sense to people Knowledge The body of information and facts about a specific subject Data management a discipline that focuses on the proper generation, storage, and retrieval of data 1.3 Introducing the database Database a collection of data that exists over a long period of time A database includes: End user data Raw facts of interest to the end user. The data you want to store. Metadata Data that describes what type of data is in the DB and where its stored Can be used to define structure/requirements of data “Data about data” Data dictionaries show metadata of DB Purpose: help people keep track of things Database Management System (DBMS) Collection of programs that manages the database structure and controls access to the data stored in the database e.g. Oracle Data in context: SAP: all business data Google: Google searches go through Google DB, cached versions stored in DB Amazon: DB keeps info on products (price, quantity, seller etc.), user accounts (name, credit card info) Facebook: Personal data, location info Types of Databases DBs can be classified according to the number of users, the location, extent of use, type of user, etc. Number of users: Singleuser database (e.g. Personal computer DB) Multiuser database (e.g. Workgroup DB (<50 workers), Enterprise DB (>50 workers)) Location: Centralised database supports data located at a single site Distributed database supports data distributed across several different sites How they will be used and on time sensitivity of info gathered from them: Operational DB support a company’s daytoday operations Data warehouse Storing data used to generate info required to make tactical/strategic decisions Degree to which data is structured: Unstructured data Data that exist in their original (raw) state (the format they were collected) Structured data Result of taking unstructured data and formatting such data to facilitate storage, use, and generation of info Semistructured data Data that have already been processed to some extent XML database supports storage and management of semistructured XML data Example DBs include: Internet, Intranet and Extranet DB Basic Terminology Character most basic element of data Field contains data (composed of characters) (e.g. Name) Record set of related fields (e.g. first name, last name, etc. of one user) Database collects related (somewhat logical) records DBMS manages the database 1.4 Why database design is important Database design the activities that focus on the design of the DB structure that will be used to store and manage enduser data Requires the designer to identify precisely the DBs expected use (affects its focus) Appropriate data repositories and relationships must be carefully considered and implemented A welldesigned DB facilitates data mgmt and generates accurate and valuable info Poorly designed > errors and bad decisions 1.5 Evolution of FIle System Data Processing History of handling data Manual filing systems Computerised filing systems via d ata files Database systems Manual filing Papers within systems organised in order to facilitate expected use of data As orgs grew + reporting requirements became more complex > keeping track of data in a manual file system more difficult File System Data Management Data processing (DP) specialist hired to create a computerbased system that would track data and produce required reports Initially computer files were similar to manual files When business users wanted data from the computerised file > request for data to DP specialist DP specialist would create program to retrieve data, manipulate to user request, and present as a printed report As more computerised files were developed > lots of data files contained related, overlapping data with no means of controlling or managing the data consistently across all files Problems with File Systems 3rd generation programming languages ( 3GL) skills required which are expensive Data is handled by programs (as in the model) Skills to organise data are not standardised > programmers must be familiar with the file system (standardised SQL makes it easier to transfer workers) Lengthy development times > difficult to get quick answers System administration is difficult as number of files expands (requires multiple file management programs) Structural dependency access to the file is dependent on its structure Data dependency Changes in data types require changing all the programs that access the file Each file must have its own file management system. Modifications are likely to produce bugs Data redundancy, inconsistencies and anomalies ( modification anomalies, insertion anomalies and deletion anomalies) Data redundancy exists when the same data is stored unnecessarily at different places Data inconsistency exists when different and conflicting versions of the same data appear in different places Data anomaly develops when not all of the required changes in the redundant data are made successfully Modification anomalies Updated data in one file not reflected in others Insertion anomalies New data in one file not inserted in others Deletion anomalies Deleted data in one file not deleted in others Data integrity The condition in which all of the data in the DB are consistent with the realworld events and conditions i.e. data is a ccurate and verifiable Lack of security Not centralised > only as safe as security implemented Limited data sharing Can’t have several computers/programs accessing the same data Update issues Database Management System (DBMS) DBMS A data storage and retrieval system which permits data to be stored nonredundantly while making it appear to the user as if the data is well integrated Means all data can be stored once (central repository) Can restrict read/write access using user rights DBMS serves as the intermediary between the user and the database Hides much of the databases internal complexity from application programs and users Advantages of DBMS ↓ data inconsistency/anomalies Data inconsistency when different versions of the same data appear in different places ↓ data redundancy and ↑ data sharing ↑ end user productivity ↓ data/structural dependency problems Data independence possible to change data type without affecting application program’s ability to access the data Structural independence Possible to make changes in the file structure without affecting the application program’s ability to access the data Easier to create access, modify and delete data Access through ad hoc queries Enforces standards Central security Standardised backup and recovery Concurrency handling (several computers interacting) ↑ decision making with higher quality data ↑ data sharing Disadvantages of DBMS Increased costs for hardware, software, personnel Management complexity e.g. different interfaces, security Maintaining currency (keeping DB current through updates, patches etc.) Vendor dependence Frequent upgrade/replacement cycles 1.7 Database Systems The DB System consists of logically related data stored in a single logical data repository Centralised DB > eliminate most of file systems data inconsistency, anomaly, dependence and structural dependence problems Database Environment Database System An organisation of components that define and regulate the collection, storage, mgmt, and use of data within a DB environment Must be tactically, strategically and costeffective Can be created and managed at different levels of complexity with varying adherence to precise standards Five components: Hardware All of the system’s physical devices e.g. computers, storage devices, printers, network devices, etc. Software Three types of software needed to make DB function: OS manages all hardware components (e.g. Windows, OS X, Linux) DBMS software Manages the Db within the DB system (e.g. Oracle, MySQL) Applications and utilities Used to access and manipulate data in the DBMS and manage the computer environment in which data access and manipulation take place People All users of the DB system. 5 types of users: System Admins Oversee the DB system’s general operations DB admins manage the DBMS and ensure that the DB is functioning properly DB designers design the DB structure System analysts/programmers Design and implement the app programs (e.g. data entry screens, reports, etc.) End users People who use the application programs to run the orgs daily operations (e.g. Managers) Procedures The instructions and rules that govern the design and use of the DB system. Enforce the standards. Data the collection of facts stored in the DB DBMS Functions Data dictionary mgmt stores definitions of data elements and their relationships (metadata) in a data dictionary DBMS provides data abstraction, and it removes structural and data dependence from the system Data storage mgmt Provides storage not only for data but for related data entry forms or screen definitions, report definitions, data validation rules, etc. also important for performance tuning Activities that make the DB perform more efficiently in terms of storage/access speed Data transformation/presentation DBMS formats the physically retrieved data to make it conform to the user’s logical expectations Security mgmt DBMS creates a security system that enforces user security/privacy User access and operation (read, add, delete, modify) rules Multiuser access control Multiple users can access the DB concurrently without compromising the integrity of the DB Backup and recovery mgmt provides to ensure data safety/integrity Data integrity mgmt DBMS promotes/enforces integrity rules > minimising data redundancy + maximising data consistency DB access languages and application programming interfaces DBMS provides data access through a nonprocedural query language (user specifies what is to be done, not how its done) e.g. SQL) DB communication interfaces Accept enduser requests via different network environments Managing the database system: A shift in focus The role of the human components changed from emphasis on programming (in file system) to focus on the broader aspects of managing the orgs data resources Chapter 2: Data Models 2.1 Data Modeling and Data Models Data Model A relatively simple representation, usually graphical, of more complex realworld data structures Represents data structures and their characteristics, relations, constraints, transformations, and other constructs with the purpose of supporting a specific problem domain “An abstraction of the real world” Data modelling Simple representation of complex world structures An iterative, progressive process Can be classified based on their d egree of abstraction: Conceptual Internal External Physical 2.2 The Importance of Data Models Data models are a communication tool (between designers, programmers, and end users) Data is viewed in different ways by different people BUT when a good DB blueprint is available, it does not matter if views are different 2.3 Data Model Basic Building Blocks Entity Anything (person, place, thing, event) about which data are to be collected/stored Entity type the general (e.g. Person) Entity instance a particular example (e.g. Daniel) Attribute A characteristic of an entity Relationship An association among entities. Can be 1:M, M:N, 1:1 Constraint A restriction placed on the data. Help ensure data integrity. Normally expressed in the form of rules 2.4 Business Rules Business rule A brief, precise, and unambiguous description of a policy, procedure, or principle within a specific organisation Help to create and enforce actions within that orgs environment Used to define entities, attributes, relationships, and constraints Must be easy to understand and widely disseminated Discovering Business Rules Main sources of business rules are company managers, policy makers, department managers, and written documentation Direct interviews with end users are quick but may be less reliable It pays to verify enduser perceptions The process of identifying and documenting business rules is essential because: Helps standardise company’s view of data Communications tool between users and designers Allow designer to understand the nature, role, and scope of the data “” understand business processes “” develop appropriate relationship participation rules/constraint Translating Business Rules into Data Model Components As a general rule, a noun in a business rule > entity, a verb > relationship To properly identify the type of relationship, you should consider that relationships are bidirectional Ask two Q’s: How many instances of B are related to one instance of A? How many instances of A are related to one instance of B? Naming Conventions Make objects unique and distinguishable from other objects Entity names Descriptive of the objects in the business environment, use familiar terminology Attribute names Descriptive of the data represented (also good to prefix with name of the entity) Proper naming convention > self documenting 2.5 The Evolution of Data Models Implementation models Hierarchical DB models (not covered) Network DB models (not covered) Objectoriented DB models Relational DB models Conceptual Models covered in this course Entityrelationship (ER) model Objectoriented (OO) model Hierarchical and Network Models Hierarchical Model Developed to manage large amounts of data for complex manufacturing projects Basic logic represented by an upsidedown tree, contains levels (segments) Within the hierarchy, a higher layer is perceived as the parent of the segment directly beneath it, called a child Advantages: Data retrieval can be f ast 1:M promotes data integrity High security Efficiency with 1:M fixed relationships Disadvantages Cannot support M:N relationships (not all situations call for only 1:M relationships) Data dependency No data definition or manipulation language Network Model Created to represent complex data relationships more effectively Allows a record to have more than one parent Advantages: Handles M:N relationships (better reflects real life) Owner/member relationship promotes d atabase integrity Data access and flexibility better than in hierarchical model Disadvantages: Difficult to design Difficult to change once implemented Data requests require highly technical skills (Programmers might have those, but managers?) Overall expensive The Relational Model Introduced in 1970 by E.F. Codd DB only requires an entity and the relationship between said entities. Info is stored regarding entities and how they related Relational diagram A representation of the relational DBs entities, the attributes with those entities, and the relationships between those entities Advantages: Ability to simplify complex relationships Data independent Relatively easy to design and redesign the database Sophisticated Structured Query Language (SQL) leads to ability to implement a d hoc queries Disadvantages; Need for specialised staff Development, installation, maintenance and security c osts The Entity Relationship model ER Model A detailed, logical representation of the data for an org or for a business area Expressed in terms of entities in the business environment, the relationships or associations among those entities, and the attributes of both the entities and their relationships Normally represented using an ER diagram, a graphical representation of the ER Model. Two notations: Chen notation (used in this course) favours conceptual modeling Crow’s foot favours a more implemnentationoriented approach ObjectOriented Data Model (OODM) Use data approach to program, develop classes etc. and how they interact Data and relationships exist in a single structure known as an object OODM is the basis for objectoriented database management system (OODBMS) OODM is a semantic model Contains meaning on relationships between facts in an object as well as info about relationships with other models Specialised for certain problems OODM allos object to contain all o perations that can be performed on it OO Terminology: Object Abstraction of a realworld entity Attributes Describe properties of an object Classes Objects of similar characteristics Unified markup language (UML) based on OO concepts that describe diagrams and symbols used to graphically model a system The Future of Data Models Hybrid DBMSs Retain adv. of relational model, provide objectoriented view of underlying data SQL data services Store data remotely without incurring expensive hardware, software, and personnel costs Companies operate on a “ payasyougo” system/cloudbased system Data Models: A summary Common characteristics of data models to be accepted: Some degree of conceptual simplicity without compromising the semantic completeness Must represent the real world as closely as possible Behaviour must be in compliance with consistency/integrity characteristics of any mode 2.6 Degree of Data Abstraction Data abstraction reduction of a particular body of data to a simplified representation of the whole External Model External Model the end users’ view of the data environment Subsets of database based on permissions A specific representation of an external view is known as an external schema Advantages of using external views: Easy to identify specific data required for each business unit Makes designers job easy by providing feedback about model’s adequacy Ensure security constraints in the DB design Makes application program development much simpler Conceptual Model Conceptual Model a global view of the entire DB as viewed by the entire org (i.e. integrates all external views) Basis for identification and highlevel description of the main data objects Uses two techniques: ER Modelling Topdown approach. Begins by looking for the data groups in the system Based off the real world Normalisation Bottomup approach. Begins by looking at the smallest individual items of data recorded by the system Building on first approach, fine tuning Advantages of conceptual model: Provides a relatively easily understood bird’seye (macro level) view of the data environment Logical design Both software independent (model does not depend on DBMS) and hardware independent (model does not depend on hardware used in implementation) ∴ changes can be made with no effect on database design Internal Model Internal Model Representation of the database as “seen” by the DBMS Used when database is implemented Internal Schema depicts specific representation of an internal model, using the database constructs supported by the chosen database i.e. depends on specific database software ∴ A change in DBMS software > internal model must change Logical independence You can change the internal model without affecting conceptual model Physical Model Physical Model operates at the lowest level of abstraction, describing the way data are saved on storage media such as disks or tapes Definition of both physical storage devices and (physical) access methods required Precision required > DB designers who work at this level have detailed knowledge of hardware/software\ Relational model logical > does not require physicallevel details Implementation of relational model may require physicallevel finetuning for ↑ performance Chapter 4: Entity Relationship (ER) Modeling 4.1 The Entity Relationship Model (ERM) Entities Entities an object about which the system requires to hold data Entity type (class) a collection of entities that share common properties or characteristics (e.g. Person) Entity instance A single occurrence of an entity type Attributes Attributes A property or characteristic of an entity that is of interest to the org Each entity type has a set of general attributes associated with it e.g. STUDENT has “Student ID”, “Student Name”, ... Each entity instance has specific values of the attributes associated with it e.g. S. LAW has “S221”, “Law, S.”, … Can be required or optional Attributes have domains (the attributes set of possible values) Types of attributes: Composite attribute Superset of subattributes (e.g. Address (= street, city, state and area code)) Composite key Two attributes to identify an instance (a composite PK) (e.g Flight_ID) Simple attribute cannot be subdivided (e.g. Student_ID) Singlevalued attribute only has one value (simple or composite) Multivalued attribute Can have many values (e.g. Skill) To split make new attributes for each instance OR make new entity Represented using double lines Derived attribute Derived using an algorithm (not physically stored) (e.g. Years employed) Represented using a dotted line Can be as simple as adding two attribute values Key attribute Unique so to identify the entity e.g. zID, Telephone Keys Key An attribute/set of attributes whose values uniquely identify one occurrence of that entity Candidate Key an attribute that uniquely identifies each instance of an entity type (potential key) Primary Key (PK) Candidate key that has been selected to be used as an identifier for an entity type Key you actually use Characteristics of a good PK: Unique values PK must uniquely identify each entity instance. Cannot contain NULLS Nonintelligent PK should not have embedded semantic meaning other than identifying No change over time PK should be permanent and unchangeable otherwise update issues for FKs, etc. Preferably singleattribute Simpler for linking FKs Preferably numeric Can implement counter style autoincrements Security compliant Don’t use sensitive data (e.g. social security number) for ID Foreign Key (FK) An attribute that contains a data item that is the P K of another entity Relationships Relationship A link between two entities (participants) which is significant for the system Relationships always operate in both directions Degree of a relationship the number of entity types that participate in that relationship e.g. Unary, Binary, Ternary, Quaternary Relationships can be: One to one One to many Many to many Recursive (in a unary relationship) Relationship strength How the PK of a related entity is defined Weak (nonidentifying) relationships PK of the related entity does not contain a PK component of the parent entity (i.e. entity is independent) Strong (identifying) relationships PK of the related entity contains a PK component of the parent entity (i.e. entity is dependent/weak) Connectivity Connectivity Describes the relationship classification (e.g. 1:1, 1:M, M:N) Indicates on ER diagram using numeric notation Cardinality Cardinality The specific number of entity occurrences associated with one occurrence of a r elated entity “For example, the cardinality (1,4) written next to the CLASS entity in the “PROFESSOR teaches CLASS” relationship indicates that each professor teaches up to four classes, which means that the PROFESSOR table’s primary key value occurs at least once and no more than four times as foreign key values in the CLASS table.” Indicated by placing appropriate numbers besides the entity using the format (x, y) where x = min and y = max DBMS cannot handle implementation of cardinalities at the table level provided by the application software or by triggers Cardinality constraint The number of instances of entity A that can be associated with each instance of entity B Derived from business rules Minimum cardinality Minimum number of instances of one entity that is associated with each instance of another entity Maximum cardinality Maximum number of instances… Relationship participation A participating entity in a relationship can be either o ptional or mandatory Determined by specific meaning of the terms used (depends on context, need to state assumptions) If Entity A has an optional relationship with Entity B, represented with a circle (see son below) Weak Entities Weak entity An entity that relies on the existence of another entity. It has a PK that is partially or totally derived from the parent entity Indicated on ER Diagram using a doublewalled entity rectangle Implemented in the DBMS if an entity has a mandatory FK Meets two conditions: Existencedependent Cannot exist without entity with which it has a relationship Has PK that is partially or totally derived from the parent entity in the relationship DB Designer usually determines whether an entity can be weak based on business rules If it is existenceindependent (exists apart from related entities) > strong (or regular) entity Composite Entity Composite entity An entity type that associates the instances of one or more entity types. Contains attributes that are peculiar (singular) to the relationship between those entity instances Turn a relationship into an entity for additional info on relationships M:N relationships should be avoided as relational databases can only handle 1:N relationships M:N relationships should be d ecomposed to 1:M relationships via a c omposite entity The composite entity: Builds a bridge between the original entities Composed of the PKs of the original entities Is existencedependent on the original entities May contain additional attributes Makes it easier to add info (new rows rather than columns) Surrogate key Not derived from data but artificially created for the composite entity A VOID! Stops cascading delete as composite entity is no longer reliant on FKs Supertype and Subtype Supertype A more generic entity type compared to its subtypes Subtype A more specific entity type compared to its supertype Inherits all attributes of the supertype Has additional, specific attributes An instance of a subtype is also an instance of a supertype BUT an instance of a supertype may or may not be an instance of one or more subtypes Generalisation and specialisation Generalisation The process of defining a general entity type from a set of specialised entity types Bottomup process from subtypes to supertypes Specialisation Defining one or more subtypes of the supertype Topdown process from supertypes to subtypes Constraints Completeness constraint whether an instance of a supertype must also be an instance of at least one subtype Total specialisation rule: Yes! Partial specialisation rule: No! Disjointness constraint whether an instance of a supertype may simultaneously be a member of two (or more) subtypes Disjoint constraint rule: No! Overlap constraint rule: Yes! Subtype discriminator(s) the attribute(s) of the supertype that determine (code, note, identify) the target subtype Disjoint Constraint rule: One attribute Overlapping constraint rule: composite attribute/several attributes 4.2 Developing an ER Diagram An iterative process, thus, based on repetition of processes and procedures. Usually involves the following activities Create a detailed narrative of the orgs operations Identify the business rules based on the description of operations Identify the main entities and relationships from the business rules Develop the initial ERD Identify the attributes and PKs that adequately describe the entities Revise and review the ERD During review, likely to uncover new objects, attributes, relationships, etc > important During design, DB designer can gain info from interviews BUT also examining business forms/reports ER Modelling Guideline Data items should be put into logical groups For each data group/entity type, there should be a key that uniquely identifies indv. members of entity type There should be no redundant data in the model Ask yourself the following Q’s: What are the relevant entities here? What are the relevant relationships here? Can I generalise some entities? Document your assumptions as you go Leave cardinalities until the end There is no mechanical procedure, use rules of thumb and intuition. You will need many drafts..! 4.3 Database Design Challenges: Conflicting Goals DB designers often make design compromises triggered by conflicting goals such as: Adherence to design standards Design standards help guide you in developing logical structures that minimise data redundancies Processing speed Many orgs priorities processing speeds which is = minimal access time which may be achieved by minimising the number/complexity of logically desirable relationships Information requirements May prioritise info generation which may > data transformations which may expand number of entities/attributes > sacrifice “clean” design and/or high speed Design is important BUT must meet end user requirements such as performance, security, shared access, data integrity, query/reporting needs etc. Documentation is important to understand and modify designs, ensures data compatibility and coherence Chapter 3: The Relational Database Model 3.1 A Logical View of Data Relational Model Relational Model Represents data in a two dimensional table called a relation. Includes: Relations Two dimensional tables Attributes The column headers of a relation Tuples The rows of a relation (records, connected) The name of a relation (table) and its set of attributes (column headers) are the schema for the relation Blueprint, no data Database schema (metadata) the set of schemas for all relations in the design Data dictionary Describes the DB schema Usually implemented in a RDBMS (relational database management system) such as O racle Relation: Every relation has a unique name Every attribute value is a tomic (no multivalue records) Every row is unique Attributes in tables have unique names Can be same name if in different tables but should refer to the same info Order of the columns/rows is irrelevant 3.2 Keys Candidate Key Any set of one or more columns whose combined values are unique among all occurrences (i.e. tuples or rows) Primary Key (PK) the PK is any candidate key of that table which the DB designer arbitrarily designates as “primary” Alternate Key the AKs are any candidate keys not currently selected as the PK Foreign Key (FK) A set of one or more columns in any table which may hold the values found in the PK column of another table The key’s role is based on determination (i.e. “A determines B” means if you know A you can determine the value of B) Determination is used in the definition of f unctional dependence “The attribute B is functionally dependent on the attribute A if each value in column A determines one and only one value in column B.” 3.3 Integrity Rules Three basic types of database integrity constraints: 1. Entity integrity Requiring each row in a table has a different PK value (no NULLS) NULLS should be avoided because their meaning isn’t clear, some designers use f lags to indicate the absence of some value (e.g. 99 to show no value has been assigned) 2. Referential integrity Requiring the existence of a corresponding PK in another table for any FK value Cascading integrity when related records are deleted 3. Domain integrity Restricting data in a column to its p redefined data types 3.4 Relational Set Operators Relational algebra Defines the theoretical way of manipulating table contents using the right relational operations: SELECT yields values for all rows found in a table that satisfy a given condition (horizontal) PROJECT yields all values for selected attributes (vertical) UNION combines all rows from two tables, excluding duplicate rows (must be union compatible tables have same attribute characteristics) INTERSECT yields only the rows that appear in both tables DIFFERENCE yields all rows in one table that are not found in the other table PRODUCT yields all possible pairs of rows from two tables (known as a Cartesian product) JOIN allows info to be combined from two or more tables’ Inner Join only returns matched records from the tables that are being joined Natural Join Links tables by selecting only the rows with common values in their common attributes Equality Join Links tables on the basis of an equality condition (=) that compares specified columns of each table Theta Join Use of any other comparison operator (>, <, etc.) to link tables Outer Join Matched pairs retained, any unmatched values left null Left Outer Join yields all rows from table A, inc. those not matched in table B Right Outer Join yields all rows from table B, inc. those not matched in table A Full Outer Join yields all rows from table A and table B DIVIDE Uses one singlecolumn table as the divisor and one 2column table as the dividend 3.5 The Data Dictionary and the System Catalog Data dictionary Provides a detailed description of all tables found within the user/designercreated database (contains all attribute names/characteristics metadata) System Catalog A detailed system data dictionary that describes all objects within the DB, inc. data about table names, the table’s creator/creation date, no. of columns in each table, data type of each column, index file names, index creators, authorised users, and access privileges Automatically produces DB documentation In general terms, homonyms (same attribute name for different attributes) and synonyms (different names to describe the same attribute) must be avoided 3.6 Relationships within the relational database Conceptual Model to Relational Model In general, each entity will be converted to a relation. THe attributes of the entity becomes the attributes of the relation Eliminate composite and multivalued attributes Translate each entity into a r elation (table) Translate appropriate relationships into a relation (others might just be a FK link) Examples of mapping the ER Diagram to the Relational Model on the next page 3.7 Data redundancy revisited The proper use of FKs does not eliminate data redundancies, but m inimises them Data redundancies can be damaging > proper use of FKs reduces this risk] Sometimes data redundancies are required, e.g. To preserve historical accuracy of data, make searching easier 3.8 Indexes Index An orderly arrangement used to logically access rows in a table. Composed of an i ndex key (the index’s reference point) and a set of pointers (where the data is) Purposes of indexes in DBMSs: Retrieve data more efficiently Retrieve data ordered by a specific attribute or attributes (e.g. can index customer’s last name and order alphabetically) Unique index an index in which the index key can have only one pointer value (row) associated with it (e.g. the PK) A table can have many indexes, but each index is associated with only one table Index key can have multiple attributes (composite index) 3.9 Codd’s Relational Database Rules Published in 1985 by Dr. E. F. Codd to define a relational database as vendors were marketing products as relational when they were not. Note: even the dominant DB vendors do not fully support all 12 rules Chapter 6: Normalisation of Database Tables 6.1 Database Tables and Normalisation Logical Data Modelling Conceptual Data Model Represents the conceptual view of org data (e.g. ER Model) Logical Data Model Describe org data in a way that could be used for i mplementation in a DBMS (e.g. Relational Model). Logical mode is still independent of any particular DBMS Redundancy DB designers aim to reduce redundancy (i.e. DB should not store same data several times) to save space and prevent problems Aim for the rule(s) of one: One type of item/entity type = (only) one r elation/table One item/entity instance = (only) one tuple/row One fact/attribute about entity = (only) one a ttribute/column Each attribute should explain (only) the entity type (relation/table) it belongs to To achieve these aims, we use normalisation techniques Normalisation Normalisation A process for converting complex data structures (relations) into simpler, more table data structures “Don’t add columns, add rows” Normalisation: Is a process that is accomplished in s tages Is a technique that is used to define “goodness” (or “badness”) of a relation Results in data structures that have some desirable (“good”) p roperties Normal Form a certain state of a relation. Can be determined by apply r ules regarding dependencies Uses a concept known as f unctional dependency Functional Dependency Functional Dependency a semantic restriction. It expresses the fact that some values for a relation are not possible, given the way the world works FDs are… relationships between attributes in a relation semantics of the attributes in a relation can be inferred in a systematic way b y applying a set of inference rules Inference Rule Logic rule for determining FD A→B is an inference rule. Read: A determines B. In a relation R: An attribute B is “functionally dependent” on an attribute A if the value of A uniquely determines the value of B Armstrong’s Inference Rules a set of inference rules that can be used to infer all the FDs based on a given set of FDs. Three rules (if x, y, z, w are attributes of a relation R) are: 1. Inclusion (Reflexive) Rule if y ⊆ x then x → y (⊆ = is a subset of) e.g. IF State ⊆ Postcode, then Postcode→State. 2052→NSW, 3000→VIC 2. Augmentation Rule if x→y then wx→wy e.g. if Postcode→State then Suburb,Postcode→Suburb, State Randwick,2052→ Randwick, NSW 3. Transitivity Rule if x→y and y→z then x→z e.g. if Postcode→Suburb and Suburb→State then Postcode→State 2052→Randwick and Randwick→NSW then 2052→NSW Armstrong’s rules can be used to determine e xtended inference rules Additivity (Union) Rule if x→y and x→z then x→yz IF Postcode→State AND Postcode→Suburb THEN Postcode→Suburb,State Combines Transitivity and Augmentation Decomposition (Projective) Rule if x→yz then x→y and x→z IF Postcode→Suburb,State THEN Postcode→Suburb AND Postcode→State Reverse of additivity rule Pseudotransitivity Rule if x→y and wy→z then wx→z IF Suburb→City AND Postcode,City→State THEN Postcode, Suburb→State Transitivity that Accumulation Rule if x→yz and z→bw then x→yzbw Decomposition that x→z, transitivity that x→bw, additivity that x→y, x→z, x→b, x→w = x→yzbw Sets of FDS: F of a set of given FDs F+ set of all implied FDs (full set). Called the c losure of F Fmin minimal set (minimal cover) of FDs equivalent to F. No redundancies does not lose info, could determine F and F+ from Fmin Use Armstrong’s inference rules to change F of FDs to F+ or Fmin 6.2 The Need for Normalisation Lossless Decomposition Our aim is to decompose relations/tables so to reduce size/redundancy We use inferences rules for this decomposition process We need to be sure that the decomposed components (tables/relations) have the l ossless join property (i.e., decomposed components could be joined back together to the original table/relation) Normalisation Normalisation is a process for converting a relation to a s tandard (normal) form. Is about being able to: Decompose a relation/table into smaller components In such a way that we could r ecapture the precise content of the original relation/table if we would join (i.e. natural join) the decomposed components Based on paper: Codd (1971) Reasons for applying normalisation: Minimise/eliminate redundancy (duplicate data, one entity is recorded more than once in DB) Prevent data inconsistencies through update, deletion, and insertion a nomalies Addition/insertion anomaly Failure to add new data in all places where data needs to be added (conflicting data) Deletion anomaly Failure to remove new data in all places data needs to be removed Update anomaly Failure to update new data in all places where data needs to be updated To make database design consistent 6.3 The Normalisation Process Two types of functional dependence: Partial dependency exists when there is a functional dependence in which the determinant is only part of the PK For example, if (A, B) → (C,D), B → C, and (A, B) is the primary key, then the functional dependence B → C is a partial dependency because only part of the primary key (B) is needed to determine the value of C Straightforward to identify Transitive dependency such that X → Y, Y → Z, and X is the primary key. In that case, the dependency X → Z is a transitive dependency because X determines the value of Z via Y. More difficult to identify BUT will occur only when functional dependence exist among nonprime attributes 1NF Aim: Create a valid relation A relation/table is in 1NF if: All attributes contain only a tomic value (i.e., there are no multivalued attributes) All PK attributes are defined and not NULL (i.e. there is at least one candidate key) Actions to create/check 1NF: Add appropriate entry in at least the PK column(s) Avoid/split multivalued attributes and avoid/split repeating groups of data (i.e transform multivalued attributes to additional columns, or better, additional rows (via a new table)) 2NF 3NF Aim: remove partial dependencies (no repeating values in nonkey fields) A relation/table is in 2NF if: Each nonkey field is functionally dependent on the entire PK (no partial dependencies) The relation/table is in 1NF Actions to create/check 2NF: Draw FDs and partial dependencies diagrams Remove partial dependencies (attributes not functionally dependent on the entire PK) by separating the data items into a separate relation using appropriate PKs (may need bridge/junction table) Hint: Look for values that occur multiple times in nonkey fields. This tells you that you have too many fields in a single table. In a welldesigned DB, the only data that is duplicated is in key fields used to connect tables Aim: Remove nonkey dependencies (data that is not dependent on other keys) A relation/table is in 3NF if: It has no transitive dependencies (no nonkey attributes determined by other noncandidatekey attributes) It is in 2NF Action to create/check 3NF: Identify and remove transitive dependency 6.4 Improving the design Areas to consider: Evaluate PK assignments Evaluate naming conventions Refine attribute atomicity (keep subdividing attributes until it can no longer be subdivided) Identify new attributes Identify new relationships Refine PKs as required for data granularity Granularity The level of detail represented by the values stored in a table’s row Using a surrogate PK provides lower granularity and yields greater flexibility Maintain historical accuracy (may require redundant data to ensure values aren’t changed) Evaluate using derived attributes 6.5 Surrogate key considerations Surrogate key may be used when: Composite PK is too cumbersome to use, difficult to write search routines PK might have too much descriptive content to be usable Other reasons (e.g. To maintain historical data) Surrogate key usually systemdefined, managed via DBMS, numeric, automatically incremented Decision requires tradeoffs and professional judgement Limitations might be undesirable from a managerial point of view > surrogate keys 3.6 HigherLevel Normal Forms BoyceCodd Normal Form (BCNF) Aim: Higher normal forms such as BCNF do cover some specific aspects and problems with 3NF (nonetheless, 3NF is widely considered to be “sufficient” by DB designers) A table is in BCNF when every determinant (left hand side of dependency) is a candidate key ∴ BCNF can only be violated (in 3NF) if a table contains more than one candidate key A relation/table is in BCNF if: No nonkey attribute determines p art of the PK (i.e. in example, B is part of PK BUT C>B ∴ not BCNF) It is in 3NF Based on paper Boyce & Codd (1974) Sometimes called 3.5NF 3NF is always achievable, BCNF is not always achievable (Beeri & Bernstein 1979) 4NF Aim: Remove multivalued dependencies (One key determines multiple values of two other attributes and those attributes are independent of each other) A relation/table is in 4NF if: No row contains two or more multivalued facts about an entity (no multivalued dependencies) Table is in 3NF Action to create/check 4NF: Create new tables for components of multivalued dependencies Note: 4NF is largely academic and problems shouldn’t be encountered if proper design procedures are used 3.7 Normalisation and Database Design Normalisation should be part of the design process You should be aware of good design principles and procedures as well as normalisation procedures: ERD is created through iterative process Normalisation focuses on characteristics of specific entities (micro view of ERD) ∴ difficult to separate normalization and ER modelling 3.8 Denormalization Normalisation is only one of many DB design goals Normalised (decomposed) tables require additional processing > ↓ processing speeds Normalisation purity is often difficult t o sustain in the modern DB environment Conflicts between design efficiency, info requirements, and processing speed solved through compromises/tradeoffs inc. denormalisation Denormalisation Process of attempting to optimise the performance of a DB by (re)adding redundant data or by grouping data (reverse process of normalisation) Advantage of higher processing speed must be carefully weighed against disadvantage of data anomalies Further, some anomalies are only theoretical interest and are not practical to remove (e.g. a separate table for ZIP (ZIP_Code, City) in a customers table Use common sense Defects of unnormalised tables: Data anomalies Less efficient data updates due to larger tables More cumbersome Indexing No simple strategies for creating ‘views’ (virtual tables) Summary Normalisation is a table design technique aimed at minimising data redundancies First 3 normal forms (1NF, 2NF, 3NF) are most commonly used Normalisation is an important part but o nly a part of the design process Best practice: Continue the iterative ER process until all entities and their attributes are defined and all equivalent tables are in 3NF In exam: If 3NF isn’t necessary, explain why looks good Go through steps of normalisation Lecture Notes Try to find a place to put these Argument Argument In logic, an argument is a set of statements of which some of them (the p remises) are intended to support another statement (the conclusion) “Valid” argument =/= “True” argument Valid means the argument is following a logical structure (“truth preserving”) Valid does not mean the contents are true (premise must be right) Deduction Deduction/deductive argument An argument whose truth of the conclusion necessarily follows from the truth of the premises Makes an absolute argument DA is “valid” if it is successful providing logical support for its conclusion (If all premises are true, then the conclusion must be true). Also sound e.g. A>B and B>C then A>C (Daniel is human, humans are mortal > Daniel is mortal) DA is “invalid” if the truth of the premises does not guarantee that the conclusion is true. Not sound e.g. A>B and A>C then B>C (Daniel is lecturer, Daniel is German > Lecturers are german) Logical structure of a deductive argument is “ truth preserving”: the truth of the premises are preserved onto the conclusion A good deductively valid argument with truth premises is “ sound” Induction Induction/Inductive Argument An argument whose probabilistics support of the conclusion necessarily stems from the data/real world observation Claims conclusion is likely true, but not necessarily true (the best answer) An argument is strong if it is backed up by significant support, and w eak if it is without such support Good inductively strong argument with true premises is “cogent” e.g. All dogs you see have fleas, Bruno is a dog > Bruno likely to have fleas (likely but not necessarily true) Abduction Abduction “reverse implication”, the mechanism that changes things e.g. You have a white bean (the result), and you know that all beans in my bag are white (the generalisation). Hence, this bean must be from my bag, for if it were, it would have to be white Inference Inference the process or outcome of “inferring”: deriving by reasoning or concluding from premises or evidence The process of deriving the strict logical consequences o f assumed premises (deductive inference). Inference is a single step in a deductive chain The process of arriving at some conclusion that, though it is not logically derivable from the assumed premises, possess some degree of probability r elative to the premises (inductive inference) In logic, modus ponens and m odus tollens are two forms for making valid inferences/valid argument Modus ponens 1. If p is true, then q is true (Daniel is reliable, so when it’s lecture time, Daniel is at UNSW) 2. P is true (it’s lecture time) Therefore, q is true (Therefore, Daniel is at UNSW) Modus tollens 1. If p is true, then q is true (Daniel is reliable… 2. Q is not true (Daniel is not at UNSW) Therefore, p is not true (Therefore, it’s not lecture time) Fallacy of modus tollens/Denying the antecedent 1. If p is true, then q is true (Daniel is reliable… 2. P is not true (It’s not lecture time) Therefore, q is not true?? (Therefore, Daniel is not at UNSW??) Chapter 7: Introduction to Structured Query Language 7.1 Introduction to SQL Relational DBMS’s query languages (e.g. SQL in Oracle) contain 3 components: 1. Data Definition Language (DDL) Used to specify the database schema or modify an existing one (Create table) 2. Data Manipulation Language (DML) Used to manipulate the data (work with existing tables) 3. Data Control Language (DCL) Used to control the DB, including saving of data (data access rights to which user) Data Definition Language Data Definition Language (DDL) DDL SQL statements define the structure of a database, inc. rows, columns, tables, indexes and DB specifics such as file locations More part of the DBMS > large differences between the SQL variations DML SQL commands inc. the following (in Oracle SQL): CREATE to make a new DB, table, index or stored query DROP to destroy an existing DB, table, index or view DBBC (Database Console Commands) statements check the physical and logical consistency of data Data Manipulation Language Data Manipulation Language (DML) DML SQL statements used to retrieve and manipulate data from the DB (i.e. this category encompasses the most fundamental commands inc. DELETE, INSERT, SELECT, and UPDATE etc.) Only minor differences between SQL variations DML SQL commands inc. the following: DELETE to remove rows INSERT to add a row SELECT to retrieve a row UPDATE to change data in specified columns Two types of DML: 1. Procedural, lowlevel DML Specify exactly what d ata is needed and h ow this data is to be created (e.g. programming language C, relational algebra) What you do and how you do it (e.g. open file) 2. Nonprocedural, highlevel DML Specify exactly what d ata is needed, but now hot to create this data (leaving the how to the internal implementation of a DBMS such as Oracle) (e.g. query language SQL, relational calculus) Data Control Language Data Control Language (DCL) DCL SQL statements control the security and permissions of the objects or parts of the DB More part of the DBMS and have hence l arge differences between the SQL variations DCL SQL commands inc. the following (in Oracle SQL): GRANT to allow specified users to perform specified tasks DENY to disallow specified users from performing specified tasks REVOKE to cancel previously granted or denied permissions Relational Languages Codd (1970, 1971)’s relational model is the conceptual and theoretical basis for relational DBs. Includes two relational languages: 1. Relational Algebra procedural, highlevel language that provides a procedural (stepbystep) way for specifying queries (Relational algebra provides a o rder of steps to get to certain data) 2. Relational Calculus nonprocedural, lowlevel language that provides a declarative way to specify DB queries (“declares” a definition to get to certain data) SQL is userfriendly relational calculus For every expression in relational algebra there is an equivalent in relational calculus and vice versa (logically equivalent) Relational algebra/calculus are not very user friendly. People almost always use SQL which is based on relational calculus, to work with RDBMS Relational Algebra Relational algebra has operations. These fall into 3 main categories: 1. Union, Intersection and Difference Boolean operations to define a new relation based on two existing relations 2. Selection and Projection Operations that remove parts of a relation 3. Cartesian Product and Join Operations that combine the tuples of two relations Union, Intersection and Difference Union, Intersection and Difference are operations* on two relations (R and S), both relations should have schemas with identical sets of attributes and identical order of the attributes *Other terms for “operations” are “operators” and “set operations” (because they refer to mathematical sets of distinct objects) UNION: R ∪ S The union of R and S is the set of all tuples that are in R, S or both In short: combine all tuples! INTERSECT: R ∩ S The intersection of R and S is the set of tuples that appear in both tables In short: find the common tuples! DIFFERENCE: R S The difference of R and S, is the set of tuples that are in R but not in S In short: Subtract the tuples in S from the tuples in R! Selection and Projection Selection and projection operations are applied to a single relation (R) SELECTION Returns a relation that contains only those tuples from a specified relation (R) that satisfy a specified condition (horizontal subset of a table) Relational operator is σ. σ predicateR PROJECTION Returns a relation that contains a list of tuples for selected attributes from a specified relation (R) eliminating duplicates (vertical subset of a table) Relational operator is Π. Πattribute 1, … attribute n R Cartesian Product and Join Cartesian = “relating to Rene Descartes (15961650) and his ideas”. Descartes made major progress in analytical geometric Cross Join (Cartesian Product) Select all possible combinations of tuples in R with tuples in S “R * S”, “all possible tuple combinations of two relations”, “everything join everything” In SQL: Explicit cross join SELECT * FROM R C ROSS JOIN S Implicit cross join SELECT * FROM R, S Inner Join Returns combined tuples from two relations that have the same value for a defined attribute (match on the attribute/fulfill a certain criterion). Default/most common join type SELECT * FROM R INNER JOIN S EXPLICIT ON R.attribute = S.attribute SELECT * FROM R, S IMPLICIT ON R.attribute = S.attribute Equi Join joins based on equivalence (=) (e.g. the example) Theta Join When other comparison operators are used (<=, >=, <, >) Natural Join Joins tuples based on all attributes with identical names in the two relations (agree in value for whatever attributes are common to the schemas of R and S attributes are not explicitly specified) Full Outer Join Selects and joins tuples from two tables that match on defined attribute. If there is no match for a tuple, the tuple will still appear with missing attributes shown as NULL SELECT * FROM R FULL OUTER JOIN S ON R.attribute = S.attribute Left Outer Join Select and joins tuple from the “left” table (R) with tuples from the “right” table (S) on defined attributes. If there is no match, the attributes from the right side will contain NULL values SELECT * FROM R LEFT OUTER JOIN S ON R.attribute = S.attribute Right Outer Join Select and joins tuple from the “left” table (R) with tuples from the “right” table (S) on defined attributes. If there is no match, the attributes from the left side wil contain NULL values SELECT * FROM R RIGHT OUTER JOIN S ON R.attribute = S.attribute SQL SQL = Structured Query Language = Sequel SQL is the first standard database language Originally developed by D. Chamberlin and R. Boyce at IBM The most common SQL standard is ANSI/ISO SQL. Latest revision is S QL:2011 Microsoft, Oracle, and other vendors have introduced deviations from ANSI SQL As a relational language, SQL has three main components Data Definition Language (DDL) Data Manipulation Language (DML) Data Control Language (DCL) SQL DDL To create the database structure: CREATE SCHEMA AUTHORIZATION creator e.g. CREATE SCHEMA AUTHORIZATION Chris CREATE DATABASE Database_Name e.g. CREATE DATABASE Student To create tables: CREATE TABLE Table_Name ( column_name data_type [NULL | NOT NULL], … ); Security considerations may require that certain data be hidden from users View is any relation that is made visible to the user A view is a “virtual relation” SQL command is: CREATE VIEW Viewname AS Statement SQL DML ANSI/ISO SQL standard use the terms “tables”, “columns” and “rows” (not relations, attributes, and tuples) The principal SQL DML statements are: SELECT INSERT UPDATE DELETE Complete SQL statements consists of reserved words and userdefined words: The reserved words are fixed part of the language The userdefined words represent the meaning of the data to the user (e.g. “users”, “bookings”) Understanding SQL Query Structures The SELECT statement is used to retrieve and display data from one or more tables Relational algebra’s selection, projection and join statements can be performed with one single SELECT statement “SELECT FROM WHERE” SELECT clause tells which attributes of the tuples matching the condition are produced as part of the answer FROM clause gives the names of relation(s) WHERE clause is a condition that tuples must satisfy in order to match the query SELECT [DISTINCT | ALL] { | [column_expression AS new_name] [, …]} FROM table_name [alias] [, …] [WHERE condition] [GROUP BY column_list] [HAVING condition] [ORDER BY column_list]; [] = optional elements {} = element may or may not appear | = “or” ; = end of the statement SQL allows us to use keyword ALL to specify all tuples are to be selected SELECT ALL SELECT * FROM PRODUCT OR FROM PRODUCT SQL supports elimination of duplicates using keyword D ISTINCT SELECT DISTINCT Std_name FROM STUDENTS Mathematical Operators for SQL Mathematical operators that can be used in the W HERE clause = equal to < less than <= less than or equal to > greater than >= greater than or equal to <> not equal to ASCII Codes in SQL All characters/signs are assigned an ASCII (American Standard Code for Information Interchange) code by the computer Comparisons of strings are made from left to right > useful for names, problems for numbers and dates (e.g. “2” is > “11”, “01/01/2020” is sorted before “12/31/2015” because 0<1) Recommendation: use the date/number format instead of string Logical (Boolean) Operators in SQL Logical operators are: OR AND NOT Found in WHERE clause Special Operators in SQL BETWEEN Used to define range limits IS NULL Used to check whether an attribute value is null LIKE Used to check for similar character strings IN Used to check whether an attribute value matches a value contains within a subset of listed values EXISTS Used to check whether an attribute has a value Ordering SQL Results ORDER BY <columns> : produces a list in ascending order (also [ASC]) ORDER BY <columns> [DESC] : produces a list in descending order SQL Numeric Functions (Aggregate Functions) Numerics functions include: COUNT : the number of rows containing a specified attribute MAX : the maximum value encountered MIN : the minimum value encountered AVG : the arithmetic mean (average) for the specified attribute SUM : the total value for the specified numeric attribute Numeric functions yield only one single value Unique vs. Distinct SELECT DISTINCT XY is correct ANSI SQL syntax SELECT UNIQUE XY is old Oracle SQL syntax (otherwise identical to DISTINCT) Note, you still do use UNIQUE to create tables and indexes CREATE TABLE Test (Attribute Numeric NOT NULL UNIQUE); CREATE UNIQUE INDEX Unique_Index ON Table (Attribute) TABLESPACE Tablespace; Note: Unique indexes guarantee that no two rows of a table have duplicate values in the key column(s). Nonunique indexes do not impose this restriction Grouping Data in SQL GROUP BY <column> A query that includes the GROUP BY groups the data from SELECT table(s) and produces single summary row for each group SELECT clause may contain column names, aggregate functions or constants All column names in SELECT list appear in the G ROUP BY clause unless the name is used only in an aggregate function The GROUP BY clause is valid only when used in conjunction with one of the SQL arithmetic functions Multiple Table Operations in SQL “Multiple table operations” are “joining operations”! (see earlier) SELECT clause identifies the attributes to be displayed FROM clause identifies the tables from which attributes are selected WHERE clause specifies the joining condition for common columns Lecture Notes: Object Oriented Modelling 8.1 Benefits/Limitations of ER/RDB Design Relational modelling of data is not the “perfect” solution Relational modelling is not the only approach to data modelling Benefits ER modelling common and easy design technique Models can be transformed, via normalisation techniques, to be implemented in standard SQLbased DBs Clear separation between applications (operations) and DB schema (data), data can be used in different applications Limitations ER models cannot adequately support c omplex data the more complex the system, the harder it is to model Poor representation of “realworld” entities > m any joins during query processing (why we denormalise) Semantic overloading Limited types of operations supported the more complicated operations must be done in application Handling of r ecursive queries is difficult Schema changes are difficult 8.2 Object Modelling Concepts Objects and Classes Objectoriented analysis and design (OOAD) models the world in objects Object An entity that has a well defined role in the a pplication domain (our system). Has a state, behaviour and identity. State State of an object encompasses its properties (attributes and relationships) and the values those properties have. (i.e. all values and relationships defined) Behaviour Represents how an object acts and reacts (operations or ‘methods’) Identity Object class a set of objects that share a common structure (share attributes, operations and relationships) (i.e. not the instance) Class diagram an objectoriented model showing: The object classes relevant for a system The internal structure of these object classes The relationships between object classes The overall structure of the system Class diagram is similar to ER EXCEPT we show what objects can do (behaviours) Two categories of relationships: Associations Horizontal relation between two object classes e.g. “Students” (object class 1) may “read” (association) “books” (object class 2) Subtype/supertype Vertical relation between two object classes e.g. “Nurses (object class 1) are “a specific kind of” (subtype) “people” (object class 2) Class diagrams show details about each object class: Attributes The dimensions/characteristics of an object class e.g. “Lecturers” (object class) have an “age” and a “faculty” (attributes) Operations The functions/services/behaviours/methods provided by an object class e.g. “Lecturers” (object class) can “teach” and “research” (operations Derivation Derived attribute An attribute that can be derived from (is based on) other attributes Derived association An association that can be derived from other associations In class diagram, a forward slash (/) indicated derivation Encapsulation Encapsulation an object hides details not relevant for their use from other objects Core idea of OO Objects can be changed only through the use of their i nterfaces (public operations) Can’t be edited by other methods Private operations are not visible, can only be executed by object Benefits of encapsulation: Control if something odd is happening, you know exactly where to look (everything is selfcontained) Flexibility you can leave work on the internal parts of the object until late Structure Impose structure on data and system and system is just objects organised Inheritance Inheritance The ability of an object class to inherit the attributes and operations of its superclass(es) e.g. Class of cats is a subclass of class of mammals. The class of mammals are a superclass of the class of cats Single inheritance A class inherits only from one superclass Multiple inheritance A class inherits from several superclasses Superclasses and Subclasses Classes can be organised into a class hierarchy A class can have multiple parent classes (several superclasssubclass relationships) A generalisation path (specialisation path) is shown as a solid line from the subclass to superclass with a hollow triangle at the end pointing toward the superclass Disjointness constraint Disjoint A subclass has no overlapping attribute with another subclass Overlapping A subclass may have overlapping attributes with another subclass Completeness constraint Incomplete There could be other subclasses than those shown on the class diagram Complete There cannot be other subclasses; all subclasses are shown on the class diagram Concrete class A class that has direct instances Real world objects e.g. Research student/coursework student Abstract class A class that has no direct instances, but its subclasses may have direct instances Conceptual placeholder for class to have to hold attributes (e.g. Postgrad student) Overriding inheritance Overriding The process of replacing a method inherited from a superclass by a more specific implementation of that method in a subclass Define new operation with same name > pick local method over supertype method Reasons for overriding: Extensions add to the operation Restrictions limit the operation Optimisations improve the operation Containment (Aggregation and Composition) Two forms of containment type parentchild relationships Aggregation Implies a relationship where the child object can exist independently of the parent object Composition Implies a relationship where the child object cannot exist independently of the parent object Difference of containment to subclasssuperclass relationships is it is part of the relationship e.g. Lecture hall can be composed of objects BUT they can exist independently > aggregation (logical relation) e.g. Student is a person > composition (hierarchical relation) Aggregation Implies a relationship where the c hild object can exist independently o f the parent object Expresses a partof relationship between a c omponent object and an aggregate object Is a kind of association in which a whole, the a ssembly, is composed of parts, the components e.g. Course (parent) and Student (child). Delete the Course and the Students still exist Represented with a hollow diamond at the aggregate end (parent) Composition Implies a relationship where the child object cannot exist independently o f the parent object A stronger form of aggregation e.g. House (parent) and Room (child). Delete the House and the Rooms cease to exist as well Composition is represented with a solid diamond at the composed end (parent) Polymorphism Polymorphism The ability of an operation to be applied to many classes Polymorphism > operations will work even if they have the same name e.g. Class: Juggler, operation: T hrow() vs. Class: Ball, operation: Throw() 8.3 Benefits/Limitations of OO Design Benefits The OO design approach provides both the data identification (in same construct (object)) and the procedures (data manipulation) to be performed It supports complex data structures and provides a much better implementation of realworld model Can very easily use objects somebody has already created (if they make it available) “Toolkit of Classes” Daniel Not platform dependent (neither is RDB but its application is) Makes sense for low level data 8.4 Comparison between OOm and ERm Limitations It is hard to learn (conceptually different philosophy) Code reusability is not easy to implement Creation of class hierarchy and defining interrelationships is difficult Queries may have to be written in 3 GL (e.g. C++) writing methods > programming, requires professionals which are hard to find Few tools (e.g. SQL) s upport is not strong Lack of support for views & security (don’t have DBMS) Also expensive 8.5 Summary of UML OO Modelling Chapter 9: Database Development Process 9.1 The Information System Information systems (IS) Systems that use IT to capture, transmit, store, retrieve, manipulate or display information used in one or more business processes Important issues when building IS: The system must solve the right problem The system must be built in the most effective way The system must fit into the existing e nvironment The system must be easy to use The performance of an IS depends on several factors: Application design and implementation (programming) DB design and implementation Administrative procedures Systems analysis The process that establishes the need for and the extent of an IS (business rules) Systems development The process of creating an IS Database Design Takes place within the context and limits of an IS development process Note: the plan and reality are likely to differ! Evolution of Software Development Process Models CodeandFix Model Computer came with software (e.g. microwave) > manufacturer to fix problem Stagewise Model Organised things with a process Waterfall Model Organised into 45 stages which require signing off > stage to stage > can use management principles (e.g. gating) Key to making software development manageable Used in large projects Agile Software Development Continually work on short iterations NOT big stages > reflects changing requirements Don’t really know cost/timeline etc. Startups and enduser software 9.2 The Systems (Software?) Development Life Cycle (SDLC) The 5 stages of the SDLC are: Planning general overview of the company and objectives Initial assessment Feasibility of a new system (feasibility study). Should address: Technical aspects of hardware/software requirements System cost Operational cost Analysis Problems defined during planning phase examined in greater detail User requirements Existing system evaluation (How do these requirements fit into overall system?) Logical system design (Data Flow Diagrams, ER Diagrams, etc.) Detailed System Design Completion of design which includes screens, menus, reports etc. (backend and frontend) Training principles and methodologies also planned Implementation hardware, DBMS software, application software installed, and DB design is implemented Cycle of coding, testing and debugging until it's ready for use Installation, finetuning Maintenance includes corrective maintenance (response to system errors), adaptive maintenance (due to changes in business environment) and perfective maintenance (to enhance the system) Evaluation Maintenance Enhancement If maintenance cost is too high, systems value is suspect 9.3 The Database Life Cycle (DBLC) Also called the Database Development Lifecycle (DDLC) Part of/subset of/embedded in the SDLC (parallel) Six phases: 1. Database Initial Study 2. Database Design 3. Implementation and Loading 4. Testing and Evaluation 5. Operation 6. Maintenance and Evolution DBLC Phase 1: Database Initial Study Analysing the organisation Objectives Operations Structures Defining problems and constraints Function of existing system Input of existing systems Output of existing systems Defining objectives Initial objectives Data sharing and interfaces with other systems Define scope and boundaries DB design for which part of the org? What hardware will be used? DBLC Phase 2: Database Design Conceptual Design Data analysis and data requirements What are the enduser views needed? What are the inputs and outputs needed? What info is needed and where does it come from? Is it necessary? Feasible? ER modelling and normalisation What are the business rules? What are the entities, attributes and relationships for ER diagram? What are the keys? Do we need to normalise? Model verification Logical Design Translating/mapping the conceptual design into internal model of a selected DBMS (e.g. ORACLE, Access, MySQL, etc.) Physical Design Defining data access characteristics o f the database (e.g. indexes) Optimising performance (e.g. choice of storage medium, hardware level) Resources for implementation DBMS Hardware Note: Logical and physical design can be carried out in parallel activities, but requires high level of understanding of software and hardware DBLC Phase 3: Implementation and Loading Core tasks: Creating DB Assigning permissions to a database administrator (DBA) Creating tables within DB Assigning permission to users Further areas needing attention: Performance hardware, software, indexes, buffer size, etc. Security Physical security, password security, access rights, audit trails, data encryption, diskless workstations Backup and recovery Data integrity Company standards Concurrency control Allowing simultaneous access to DB while preserving data integrity Database security and data privacy Database security is to ensure that only authorised users can perform authorised activities at authorised times Authentication > user has the basic right to use the system (which user it is) Authorization > user has the right to do specific activities on the system (particular rights) Data Privacy (information privacy) the relationship between collection and dissemination of data, technology, the public expectation of privacy and the legal and political issues surrounding them Privacy concerns exist wherever p ersonally identifiable info is stored. Legal/ethical use of DB Several wide range of sources where data privacy issues can arise, e.g.: Healthcare records Financial institutions/transactions Criminal justice investigations DBLC Phase 4: Testing and Evaluation Testing performance/performance fine tuning (over normalised?) Testing security constraints Testing integrity Testing concurrent access DBLC Phase 5: Operation DB (and application) completed, “Going into production/operation” Running IS Users and applications start to insert, receive, update and delete data… DBA starts to (ongoing) finetune performance, allocate storage space, control access, backup data Becomes ‘goto’ guy DBLC Phase 6: Maintenance and Evolution DBA has responsibility for routine maintenance activities within DB: Preventative maintenance (backup) Corrective maintenance (recovery) Adaptive maintenance (enhancing performance, adding entities, adding attributes, etc.) Access control, statistics, auditing, periodic systemusage summaries, etc. 9.4 Conceptual Design Conceptual Design first stage of DB design process. Goal is to design a DB that is independent of DB software and physical details Output: conceptual data model that describes main data entities, attributes, relationships, and constraints 4 steps: 1. Data analysis and requirements 2. Entity relationship modeling and normalisation 3. Data model verification 4. Distributed DB design 9.5 DBMS Software Selection Selection of DBMS critical to IS’s smooth operation Most common factors affecting purchase decision are: Cost inc. original purchase price, maintenance, operation, license, installation, training, conversion DBMS features and tools some DBMS inc. tools to facilitate application development task (e.g. query by example, report generators, data dictionaries, etc.), some make DBA job easier (security, concurrency control, 3rdparty support etc.) Underlying model can be hierarchical, network, relational, object/relational, OO Portability DBMS can be portable across platforms, systems, and languages DBMS Hardware requirements inc. processor(s), RAM, disk space, and so on 9.6 Logical design Logical Design 2nd stage in DB design process. Goal is to design an enterprisewide DB based on a specific data model but independent of physical level details All objects in conceptual model mapped to specific constructs 4 steps: 1. Map conceptual model to logical model components 2. Validate logical model using normalisation 3. validate logical model integrity constraints 4. Validate logical model against user requirements 9.7 Physical Design Physical design Process of determining the data storage organisation and data access characteristics in order to ensure its integrity, security, and performance 3 steps: 1. Define data storage organisation 2. Define integrity and security measures 3. Determine performance measures 9.8 Database Design Strategies Two classical approaches to DB design Topdown design starts by identifying data sets then defines data elements for each of those sets. (i.e. identification of different entity types then definition of each entity’s attributes) Better when overwhelming number, variety and complexity of entities, relations and transactions Bottomup design first identifies data elements (items) then groups them together in data sets (i.e. first defines attributes, then groups them to form entities More productive for small DBs with few entities, attributes, relations, transactions Selection often depends on: Scope of the problem Personal preference Company’s structure (centralised or decentralised) Two approaches are complementary rather than mutually exclusive 9.9 Centralised vs Decentralised Design Centralised design Productive when the data component is composed of a relatively small no. of objects and procedures Chapter 15: Database Administration and Security 15.3 Introduction of a Database: Special considerations DBMS and Organisation Change DBMS is a software package with computer programs that control creation, maintenance and use of databases (e.g. Oracle) The introduction of a DBMS to an org may affect the org in various ways When a new DBMS is introduced, three important aspects have to be addressed: Technological DBMS hardware and software Managerial Administrative functions Cultural Corporate resistance to change Not just being more effective BUT must look at power structure/org culture (may resist change) 15.4 The Evolution of the Database Administrator Function Database Administrator (DBA) A technical function that is responsible for physical DB design and for dealing with technical issues such as security, enforcement, DB performance, backup and recovery Data Administrator (DA) A highlevel function that is responsible for the overall mgmt of data resources in an org, inc. maintaining corporatewide definitions and standards Responsible for control overall company data resource Job description covers a larger area of operations than the DBA Higher lead role (corporate) DBA Responsible for the control of databases Role varies between companies Location of DBA varies (is up to company mgmt) Larger corporations make a distinction between DBA vs DA DBA Tasks/Responsibilities Installing and upgrading the DB server Allocating system storage and planning future storage Modifying the DB structure Enrolling users and maintaining system security Ensuring compliance with DB vendor license agreement Controlling and monitoring user access Monitoring and optimising the performance Planning for backup and recovery Maintaining archived data Backing up and restoring DBs Contacting DB vendor (e.g. for technical support) Generating various reports DBA Ethics Source: DBA Code of Ethics Responsibilities to Company: (don’t be unethical) Follow internal standards and regulations Inform openly about issues, provide complete info, do not create knowledge silo Ensure uptodate security, have recovery plan in place Responsibilities to Externals (stakeholders): Follow external regulations standards, laws Protect externals from inappropriate data use could be whistleblowing BUT mainly acting ethical Ensure privacy through authorisation and security critical data should be secured Responsibilities to CoWorkers: Be honest and open with coworkers Protect coworkers form inappropriate data use Share, teach and help grow collective knowledge base Responsibilities to One’s self: Stay up to date on industry and tech Stay up to date on regulations Learn new techniques, new tools and best practices

Database Systems Notes: INFS1603 - Ben Munns

Related documents

Products

Support

Database Systems Notes: INFS1603 - Ben Munns

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib