The Database Development Process Dr. Emad M. Alsukhni Modern Database Management 7th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2005 by Prentice Hall 1 Systems Development Life Cycle (cont.) Project Identification and Selection Project Initiation and Planning Analysis Analysis 2- logical database design • The conceptual data model must be transformed into a logical data model, which describes the data in terms of the data management technology that will be used to implement the database 1- Database activity – Logical Design conceptual data modeling • Analyst produces a detailed Physical Design data model that identifies all the organizational data that must be managed for this information system( every data attribute is Implementation defined, all categories of data are listed, every business relationship between data entities is represented and every rule is specified) Dr. Emad M. Alsukhni Maintenance 2 Enterprise Data Model First step in database development Specifies scope and general content Its purpose is to create an overall picture of organizational data at high level of abstraction, not the design for a particular database Entity-relationship diagram Descriptions of entity types Relationships between entities Business rules Dr. Emad M. Alsukhni 3 Figure 2-1 Segment from enterprise data model (Pine Valley Furniture Company) [simplified E-R diagram, repeat of figure 1.3] Enterprise data model describes the high-level entities in an organization and the relationship between these entities Dr. Emad M. Alsukhni 4 Systems Development Life Cycle (cont.) (figures 2-4, 2-5) Purpose – programming, testing, training, installation, documenting Deliverable – operational programs, documentation, training materials Project Identification and Selection Project Initiation and Planning Analysis Database activity – database implementation Logical Design • Designer writes, tests, and installs the programs that process Physical Design the database • The designer will finalize all database Implementation Implementation documentation, train users, put procedures into place for the ongoing support of information system Maintenance (and databse) users and load data from existing information sources (files, and databases from legacy applications plus data now needed) Dr. Emad M. Alsukhni 5 Systems Development Life Cycle (cont.) (figures 2-4, 2-5) Project Identification and Selection Purpose – monitor, repair, enhance Deliverable – periodic audits Project Initiation and Planning Analysis Logical Design Database activity – database maintenance Physical Design • The designer adds, deletes, or changes characteristics of the structure Implementation of a database in order to meet changing business conditions • Correct errors in database design • Improve the processing speed of database applications Dr. Emad M. Alsukhni Maintenance Maintenance 6 Dr. Emad M. Alsukhni 7 Database Schema External Schema This is the view (or views) of managers and other employees who are the database users. Can be represented as a combination of the enterprise data model and a collection of detailed user views user view is a logical description of some portion of an enterprise database that is required by a user to perform some task Example, online order form used by customers of PVFC Dr. Emad M. Alsukhni 8 Database Schema Conceptual Schema Combines the different external views into a single, coherent definition of the enterprise’s data detailed specification of the overall structure of organizational data that is independent of any database management technology Internal Schema: consists of two separate schemas Logical schema: is the representation of data for a type of data management technology (e.g., relational) Physical schema: describes how data are to be represented and stored in secondary storage using a particular DBMS (e.g., Oracle) Dr. Emad M. Alsukhni 9 Different people have different views of the database…these are the external schema The internal schema is the underlying design and implementation Dr. Emad M. Alsukhni 10 Strategies for Development First, you will develop a first-cut conceptual schema based on the organization’s enterprise data model and a general understanding of the organization's data requirements on a project Then you will develop user views to represent data needs for the various users of this database In most cases, an analysis of the user views will yield new attributes and possibly entities and relationships not shown on the conceptual schema So, the conceptual schema is augmented with these requirements , thus making the conceptual and external schemas consistent This process of evolving the two schemas continues until the schemas are fully defined Then the internal schema (logical and physical models) are developed Dr. Emad M. Alsukhni 11 CASE Computer-Aided Software Engineering (CASE) – software tools providing automated support for systems development Dr. Emad M. Alsukhni 12 Relationship symbols Entity symbols Attribute symbols A special entity that is also a relationship Relationship degrees specify number of entity types involved Relationship cardinalities specify how many of each entity type is allowed Dr. Emad M. Alsukhni 13 Sample E-R Diagram (Figure 3-1) Dr. Emad M. Alsukhni 14 What Should an Entity Be? SHOULD BE: An object that will have many instances in the database An object that will be composed of multiple attributes An object that we are trying to model SHOULD NOT BE: A user of the database system An output of the database system (e.g. a report) Dr. Emad M. Alsukhni 15 Figure 3-4 Inappropriate entities System output System user Appropriate entities Dr. Emad M. Alsukhni 16 Attributes Attribute - property or characteristic of an entity type Classifications of attributes: Required versus Optional Attributes Simple versus Composite Attribute Single-Valued versus Multivalued Attribute Stored versus Derived Attributes Identifier Attributes Dr. Emad M. Alsukhni 17 Identifiers (Keys) Identifier (Key) - An attribute (or combination of attributes) that uniquely identifies individual instances of an entity type Simple Key versus Composite Key Candidate Key – an attribute that could be a key…satisfies the requirements for being a key Dr. Emad M. Alsukhni 18 Characteristics of Identifiers Will not change in value Will not be null No intelligent identifiers (e.g. containing locations or people that might change) Substitute new, simple keys for long, composite keys Dr. Emad M. Alsukhni 19 Figure 3-7 – A composite attribute An attribute broken into component parts Dr. Emad M. Alsukhni 20 Figure 3-9a – Simple key attribute The key is underlined Dr. Emad M. Alsukhni 21 Figure 3-9b – Composite key attribute The key is composed of two subparts Dr. Emad M. Alsukhni 22 Figure 3-8 – Entity with a multivalued attribute (Skill) and derived attribute (Years_Employed) What’s wrong with this? Multivalued: Derived from date employed and current date Dr. Emad M. Alsukhni an employee can have more than one skill 23 Figure 3-19 – An attribute that is both multivalued and composite This is an example of time-stamping Dr. Emad M. Alsukhni 24 More on Relationships Relationship Types vs. Relationship Instances Relationships can have attributes The relationship type is modeled as the diamond and lines between entity types…the instance is between specific entity instances These describe features pertaining to the association between the entities in the relationship Two entities can have more than one type of relationship between them (multiple relationships) Associative Entity – combination of relationship and entity Dr. Emad M. Alsukhni 25 Cardinality of Relationships One-to-One One-to-Many Each entity in the relationship will have exactly one related entity An entity on one side of the relationship can have many related entities, but an entity on the other side will have a maximum of one related entity Many-to-Many Entities on both sides of the relationship can have many related entities on the other side Dr. Emad M. Alsukhni 26 Dr. Emad M. Alsukhni 27 Dr. Emad M. Alsukhni 28 Entities can be related to one another in more than one way Dr. Emad M. Alsukhni 29 Relation Definition: A relation is a named, two-dimensional table of data Table consists of rows (records), and columns (attribute or field) Requirements for a table to qualify as a relation: It must have a unique name. Every attribute value must be atomic (not multivalued, not composite) Every row must be unique (can’t have two rows with exactly the same values for all their fields) Attributes (columns) in tables must have unique names The order of the columns must be irrelevant The order of the rows must be irrelevant NOTE: all relations are in 1st Normal form Dr. Emad M. Alsukhni 30 Correspondence with E-R Model Relations (tables) correspond with entity types and with many-to-many relationship types Rows correspond with entity instances and with many-to-many relationship instances Columns correspond with attributes NOTE: The word relation (in relational database) is NOT the same as the word relationship (in E-R model) Dr. Emad M. Alsukhni 31 Key Fields Keys are special fields that serve two main purposes: Primary keys are unique identifiers of the relation in question. Examples include employee numbers, social security numbers, etc. This is how we can guarantee that all rows are unique Foreign keys are identifiers that enable a dependent relation (on the many side of a relationship) to refer to its parent relation (on the one side of the relationship) Keys can be simple (a single field) or composite (more than one field) Keys usually are used as indexes to speed up the response to user queries (More on this in Ch. 6) Dr. Emad M. Alsukhni 32 Figure 5-5: Referential integrity constraints (Pine Valley Furniture) Referential integrity constraints are drawn via arrows from dependent to parent table Dr. Emad M. Alsukhni 33 Referential integrity constraints are implemented with foreign key to primary key references Dr. Emad M. Alsukhni 34 Data Normalization Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data The process of decomposing relations with anomalies to produce smaller, well-structured relations Dr. Emad M. Alsukhni 35 Well-Structured Relations A relation that contains minimal data redundancy and allows users to insert, delete, and update rows without causing data inconsistencies Goal is to avoid anomalies Insertion Anomaly – adding new rows forces user to create duplicate data Deletion Anomaly – deleting rows may cause a loss of data that would be needed for other future rows Modification Anomaly – changing data in a row forces changes to other rows because of duplication General rule of thumb: a table should not pertain to more than one entity type Dr. Emad M. Alsukhni 36 Example – Figure 5.2b Question – Is this a relation? Answer – Yes: unique rows and no multivalued attributes Question – What’s the primary key? Answer – Composite: Emp_ID, Course_Title Dr. Emad M. Alsukhni 37 Anomalies in this Table Insertion – can’t enter a new employee without having the employee take a class Deletion – if we remove employee 140, we lose information about the existence of a Tax Acc class Modification – giving a salary increase to employee 100 forces us to update multiple records Why do these anomalies exist? Because there are two themes (entity types) into one relation. This results in duplication, and an unnecessary dependency between the entities Dr. Emad M. Alsukhni 38 Figure 5.22 Steps in normalization Dr. Emad M. Alsukhni 39 First Normal Form No multivalued attributes Every attribute value is atomic Fig. 5-25 is not in 1st Normal Form (multivalued attributes) it is not a relation Fig. 5-26 is in 1st Normal form All relations are in 1st Normal Form Dr. Emad M. Alsukhni 40 Table with multivalued attributes, not in 1st normal form Note: this is NOT a relation Dr. Emad M. Alsukhni 41 Table with no multivalued attributes and unique rows, in 1st normal form Note: this is relation, but not a well-structured one Dr. Emad M. Alsukhni 42 Anomalies in this Table Insertion – if new product is ordered for order 1007 of existing customer, customer data must be re-entered, causing duplication Deletion – if we delete the Dining Table from Order 1006, we lose information concerning this item's finish and price Update – changing the price of product ID 4 requires update in several records Why do these anomalies exist? Because there are multiple themes (entity types) into one relation. This results in duplication, and an unnecessary dependency between the entities Dr. Emad M. Alsukhni 43 Second Normal Form 1NF PLUS every non-key attribute is fully functionally dependent on the ENTIRE primary key Every non-key attribute must be defined by the entire key, not by only part of the key No partial functional dependencies Dr. Emad M. Alsukhni 44 Order_ID Order_Date, Customer_ID, Customer_Name, Customer_Address Customer_ID Customer_Name, Customer_Address Product_ID Product_Description, Product_Finish, Unit_Price Order_ID, Product_ID Order_Quantity Therefore, NOT in 2nd Normal Form Dr. Emad M. Alsukhni 45 Getting it into Second Normal Form Partial Dependencies are removed, but there are still transitive dependencies Dr. Emad M. Alsukhni 46 Third Normal Form 2NF PLUS no transitive dependencies (functional dependencies on non-primary-key attributes) Note: this is called transitive, because the primary key is a determinant for another attribute, which in turn is a determinant for a third Solution: non-key determinant with transitive dependencies go into a new table; non-key determinant becomes primary key in the new table and stays as foreign key in the old table Dr. Emad M. Alsukhni 47 Getting it into Third Normal Form Transitive dependencies are removed Dr. Emad M. Alsukhni 48