1 Week 11 November 7 • Data Normalization and ERD • Conceptual, Logical and Physical Database Design R. Ching, Ph.D. • MIS • California State University, Sacramento Data Normalization 2 • The purpose of normalization is to produce a stable set of relations that is a faithful model of the operations of the enterprise. – Achieve a design that is highly flexible – Reduce redundancy – Ensure that the design is free of certain update, insertion and deletion anomalies Catherine Richardo, 1990 R. Ching, Ph.D. • MIS • California State University, Sacramento Normalization 3 1NF Flat file 2NF Partial dependencies removed 3NF Transitive dependencies removed BCNF Every determinant is a candidate key 4NF Non-tivial multi-valued dependencies removed R. Ching, Ph.D. • MIS • California State University, Sacramento Order No. Date: Stereos To Go 6 / 15 / 99 Address: Invoice John Smith 2036-26 Street Sacramento CA 95819 City Date Shipped: Item Number Product Code State 0000 000 0000 0 John Smith 1/05 Zip Code 6 / 18 / 99 Product Description/Manufacturer 1 SAGX730 Pioneer Remote A/V Receiver 2 AT10 CDPC725 Cervwin Vega Loudspeakers Sony Disc-Jockey CD Changer 3 Go, Hogs Stereos To Go 0000-000-0000-0 Account No. Customer: 10001 Qty Price 1 56995 1 35995 1 39995 4 5 Subtotal Shipping & Handling Sales Tax Total R. Ching, Ph.D. • MIS • California State University, Sacramento 132985 10000 10306 153291 4 Unnormalized Relation (Invoice_number, Invoice_date, Date_delivered, Cust_account Cust_name Cust_addr Cust_city Cust_state Zip_code, Item1 Item1_descrip Item1_qty Item1_price, Item2 Item2_descrip Item2_qty Item2_price, . . . , Item7 Item7_descrip Item7_qty Item7_price) How would a program process the data to recreate the invoice? R. Ching, Ph.D. • MIS • California State University, Sacramento 5 Unnormalized to 1NF 6 (Invoice_number, Invoice_date, Date_delivered, Cust_account Cust_name Cust_addr Cust_city Cust_state Zip_code, Item1, Item1_descrip, Item1_qty, Item1_price, Item2, Item2_descrip, Item2_qty, Item2_price, . . . , Repeating groups Item7, Item7_descrip, Item7_qty, Item7_price) A flat file places all the data of a transaction into a single record. This is reminiscent of a COBOL or BASIC program processing a single transaction with one read statement. R. Ching, Ph.D. • MIS • California State University, Sacramento Unnormalized to 1NF 7 (Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code, Item, Item_descrip, Item_qty, Item_price) Nominated group of attributes to serve as the key (form a unique combination) • Eliminate the repeating groups. • Each row retains data for one item. • If a person bought 5 items, we would have five tuples R. Ching, Ph.D. • MIS • California State University, Sacramento 1NF 8 Flat File Item Description Item Item Quantity Price 10001 123456 John Smith ••• SAGX730 Pioneer Remote A/V Rec 10001 123456 John Smith ••• AT10 1 569.95 Cerwin Vega Loudspeakers 1 359.95 10001 123456 John Smith ••• CDPC725 Sony Disc Jockey CD 1 399.95 10001 123456 John Smith ••• S/H Shipping 1 100.00 10001 123456 John Smith ••• Tax Sales Tax 1 103.06 R. Ching, Ph.D. • MIS • California State University, Sacramento From 1NF (Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code, Item, Item_descrip, Item_qty, Item_price) Functional dependencies and determinants Example: item_descrip is functionally dependent on item, such that item is the determinant of item_descript. R. Ching, Ph.D. • MIS • California State University, Sacramento 9 From 1NF to 2NF (Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code) (Item, Item_descrip, Item_qty, Item_price) Is this unique by itself? What happens if the item is purchased more than once? R. Ching, Ph.D. • MIS • California State University, Sacramento 10 From 1NF to 2NF 11 (Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code) Partial dependency (Invoice_number, Item, Item_descrip, Item_qty, Item_price) Composite key (forms a unique combination) R. Ching, Ph.D. • MIS • California State University, Sacramento From 1NF to 2NF (Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code) (Invoice_number, Item, Item_qty, Item_price) (Item, Item_descrip) R. Ching, Ph.D. • MIS • California State University, Sacramento 12 From 2NF to 3NF (Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code) (Invoice_number, Item, Item_qty, Item_price) (Item, Item_descrip) Which attributes are dependent on others? Is there a problem? R. Ching, Ph.D. • MIS • California State University, Sacramento 13 Transitive Dependencies and Anomalies • Insertion anomalies – To add a new row, all customer (name, address, city, state, zip code, phone) and products (description) must be consistent with previous entries • Deletion anomalies – By deleting a row, a customer or product may cease to exist • Modification anomalies – To modify a customer’s or product’s data in one row, all modifications must be carried out to all others R. Ching, Ph.D. • MIS • California State University, Sacramento 14 Insertion and Modification Anomalies For example… 15 Insert a new Panasonic product Product_code Manufacturer_name DVD-A110 PV-4210 PV-4250 Panasonic Panasonic Panasonic CT-32S35 PAN Inconsistency DVD-A110 PV-4210 PV-4250 CT-32S35 Panasonic PanaSonic Pana Sonic PAN R. Ching, Ph.D. • MIS • California State University, Sacramento Change all Panasonic products’ manufacturer name to “Panasonic USA” Deletion Anomaly For Example… 4377182 4398711 4578461 4873179 John Smith Arnold S Gray Davis Lisa Carr 16 Sacramento Davis Sacramento Reno CA CA CA NV 95831 95691 95831 89557 By deleting customer Arnold S, we would also be deleting Davis, California. R. Ching, Ph.D. • MIS • California State University, Sacramento Transitive Dependencies A condition where A, B, C are attributes of a relation such that if A B and B C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C). Invoice_number Invoice_date Date_delivered Cust_account Cust_name Cust_addr Cust_city Cust_state Zip_code Item Item_descrip Invoice_number+Item Item_qty Item_price R. Ching, Ph.D. • MIS • California State University, Sacramento 17 Why Should City and State Be Separated from Customer Relation? • City and state are dependent on zip code for their values and not the customer’s identifier (i.e., key). Zip_code City, State • Otherwise, Cust_account Cust_addr, Zip_code City, State In which case, you have transitive dependency. R. Ching, Ph.D. • MIS • California State University, Sacramento 18 3NF Invoice Relation (Invoice_number, Invoice_date, Date_delivered, Cust_account) Customer Relation (Cust_account, Cust_name, Cust_addr, Zip_code) Zip_code Relation (Zip_code, City, State) Invoice_items Relation (Invoice_number, Item, Item_qty, Item_price) Items Relation (Item, Item_descrip) R. Ching, Ph.D. • MIS • California State University, Sacramento 19 3NF 20 Invoice Relation (Invoice_number, Invoice_date, Date_delivered, Cust_account) Customer Relation (Cust_account, Cust_name, Cust_addr, Zip_code) Zip_code Relation (Zip_code, City, State) Invoice_items Relation (Invoice_number, Item, Item_qty, Item_price) Items Relation Manufacturers Relation (Item, Item_descrip) (Manuf_code, Manuf_name) Since the Items relation contains the manufacturer’s name in the description, a separate Manufacturers relation can be created R. Ching, Ph.D. • MIS • California State University, Sacramento 21 R. Ching, Ph.D. • MIS • California State University, Sacramento First to Third Normal Form (1NF - 3NF) • 1NF: A relation is in first normal form if and only if every attribute is single-valued for each tuple (remove the repeating or multi-value attributes and create a flat file) • 2NF: A relation is in second normal form if and only if it is in first normal form and the nonkey attributes are fully functionally dependent on the key (remove partial dependencies) • 3NF: A relation is in third normal form if it is in second normal form and no nonkey attribute is transitively dependent on the key (remove transitive dependencies) R. Ching, Ph.D. • MIS • California State University, Sacramento 22 23 Putting It Together ERD of the Normalized Data Model R. Ching, Ph.D. • MIS • California State University, Sacramento 3NF 24 Invoice Relation (Invoice_number, Invoice_date, Date_delivered, Cust_account) Customer Relation (Cust_account, Cust_name, Cust_addr, Zip_code) Zip_code Relation (Zip_code, City, State) Invoice_items Relation (Invoice_number, Item, Item_qty, Item_price) Items Relation Manufacturers Relation (Item, Item_descrip, Manuf_code) (Manuf_code, Manuf_name) R. Ching, Ph.D. • MIS • California State University, Sacramento ERD Invoices Invoice_number Invoice_date Date_delivered Cust_account Invoice_items Invoice_number Item Item_qty Item_price 25 Customers Cust_account Cust_name Cust_addr Zip_code Items Item Item_descrip Manuf_code R. Ching, Ph.D. • MIS • California State University, Sacramento Zip_Codes Zip_code City State Manufacturers Manuf_code Manuf_name ERD Invoices Invoice_number Invoice_date Date_delivered Cust_account 26 Order (0..*) (1..1) Customers Cust_account Cust_name Cust_addr Zip_code Locate (0..*) (1..1) Zip_Codes Zip_code City State (1..1) Have (1..*) Invoice_items Items Invoice_number Appear on Item (0..*) (1..1) Item_descrip Item Manuf_code Item_qty Item_price R. Ching, Ph.D. • MIS • California State University, Sacramento Manufacturers Produce (0..*) (1..1) Manuf_code Manuf_name ERD Invoices Invoice_number Invoice_date Date_delivered Cust_account (1..1) Have (1..*) Invoice_items 27 Order (0..*) (1..1) Customers Cust_account Cust_name Cust_addr Zip_code Locate (0..*) (1..1) Zip_Codes Zip_code City State Partial Zip codes locate Customers. () A zip code can be related to a minimum of zero and a maximum of Manufacturers Items many customers. Produce () A customer can be related to a Item Manuf_code (1..1) Item_descrip minimum and(0..*) maximum of Manuf_name one zip code. Invoice_number Appear on (0..*) (1..1) Item Manuf_code Item_qty Item_price R. Ching, Ph.D. • MIS • California State University, Sacramento ERD Invoices Invoice_number Invoice_date Date_delivered Cust_account 28 Order (0..*) (1..1) Partial Customers Cust_account Cust_name Cust_addr Zip_code Locate (0..*) (1..1) Zip_Codes Zip_code City State (1..1) Customers order (items) on invoices. Have () A customer can be related to a (1..*) minimum of zero and a maximum of Invoice_items Items many invoices. () An invoiceAppear can be on related Item to a Invoice_number (0..*) (1..1)ofItem_descrip Item minimum and maximum one Manuf_code Item_qty customer. Item_price R. Ching, Ph.D. • MIS • California State University, Sacramento Manufacturers Produce (0..*) (1..1) Manuf_code Manuf_name ERD Invoices Invoice_number Invoice_date Date_delivered Cust_account 29 Customers Zip_Codes Order Cust_account Locate Zip_code (0..*) (1..1) Cust_name (0..*) (1..1) City Cust_addr State Invoices possess invoice items Zip_code () An invoice can be related to a (1..1) minimum of one and a maximum of Have Mandatory many invoice items. (1..*) () An invoice item can be related to a minimum invoice. Invoice_items Items and maximum of oneManufacturers Invoice_number Appear on Item (0..*) (1..1) Item_descrip Item Manuf_code Item_qty Item_price R. Ching, Ph.D. • MIS • California State University, Sacramento Produce (0..*) (1..1) Manuf_code Manuf_name ERD 30 Invoices Customers Items are sold on invoice items. Invoice_number () An item canOrder be relatedCust_account to a Invoice_date Cust_name (0..*) (1..1) minimum of zero and a maximum of Date_delivered Cust_addr many invoice items. Cust_account Zip_code Locate (0..*) (1..1) Zip_Codes Zip_code City State () An invoice item can be related to a (1..1) minimum and maximum of one item. Have (1..*) Partial Invoice_items Items Invoice_number Appear on Item (0..*) (1..1) Item_descrip Item Manuf_code Item_qty Item_price R. Ching, Ph.D. • MIS • California State University, Sacramento Manufacturers Produce (0..*) (1..1) Manuf_code Manuf_name ERD Invoices Invoice_number Invoice_date Date_delivered Cust_account (1..1) Have 31 Order (0..*) (1..1) Manufacturers produce items. Customers Zip_Codes () A manufacturer can be related to a Cust_account Zip_code minimum of zeroLocate and a maximum of Cust_name (0..*) (1..1) City many items. Cust_addr State () An item can be related to a Zip_code minimum and maximum of one manufacturer. (1..*) Partial Invoice_items Items Invoice_number Appear on Item (0..*) (1..1) Item_descrip Item Manuf_code Item_qty Item_price R. Ching, Ph.D. • MIS • California State University, Sacramento Manufacturers Produce (0..*) (1..1) Manuf_code Manuf_name 32 Higher Forms of Data Normalization • • • • Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4NF) Fifth Normal Form (5NF) Domain Key Normal Form (DKNF) R. Ching, Ph.D. • MIS • California State University, Sacramento Boyce-Codd Normal Form (BCNF) • A relation is in Boyce-Codd normal form if and only if every determinant is a candidate key A determines B Attribute A Attribute B Determinant (B is functionally dependent on A) • For a relation with only one candidate key, 3NF and BCNF are equivalent. • Usually occurs when keys of different relations overlap R. Ching, Ph.D. • MIS • California State University, Sacramento 33 BCNF Example User (UserID, Dept, Name, ComputerID, EmpClassification) ComputerID Dept (a department issues a computer) UserID, Dept ComputerID, Name, EmpCassification (Employees may have the same name and UserIDs are unique within the department only) UserID, ComputerID Dept, Name, EmpClassification BCNF UserComputer (ComputerID, Dept) User (UserID, ComputerID, Name, EmpClassification) R. Ching, Ph.D. • MIS • California State University, Sacramento 34 From 3NF to BCNF 35 Invoice Relation (Invoice_number, Invoice_date, Date_delivered, Cust_account) Customer Relation (Cust_account, Cust_name, Cust_addr, Zip_code) Zip_code Relation (Zip_code, City, State) Invoice_items Relation (Invoice_number, Item, Item_qty, Item_price) Items Relation (Item, Item_descrip) Manufacturers Relation (Manuf_code, Manuf_name) R. Ching, Ph.D. • MIS • California State University, Sacramento Candidate keys? Fourth Normal Form (4NF) • A relation is in fourth normal form if and only if it is in Boyce-Codd normal form and there are no nontrivial dependencies. – Identify all determinants and make sure they are candidate keys R. Ching, Ph.D. • MIS • California State University, Sacramento 36 4NF Example 37 Employee (EmployeeID, Dept, Project) Matrix management 100 100 100 102 102 102 Finance Marketing Finance Finance Marketing Finance 4NF F177-99 F177-99 F288-00 F288-00 F177-99 F177-99 Employee (EmployeeID, Dept) Projects (EmployeeID, Project) R. Ching, Ph.D. • MIS • California State University, Sacramento Multivalued dependencies 4NF Example Matrix management 100 100 100 102 102 102 100 100 102 102 38 Finance Marketing Finance Finance Marketing Finance 4NF Finance Marketing Finance Marketing R. Ching, Ph.D. • MIS • California State University, Sacramento F177-99 F177-99 F288-00 F288-00 F177-99 F177-99 100 100 102 102 Multivalued dependencies F177-99 F288-00 F288-00 F177-99 Fifth Normal Form (5NF) aka Project-Join NF 39 • A relation is in fifth normal form if no remaining nonloss projections (i.e., all projects preserve all information contained in the original relation)are possible, except the trivial one in which the key appears in each project. – The join of all projects will result in the original relation – No systematic method exists for obtaining 5NF or for ensuring that a set of relations is indeed 5NF Ricardo, 1990 R. Ching, Ph.D. • MIS • California State University, Sacramento Domain-Key Normal Form (DKNF) • A relation is in domain-key normal form if every constraint is a logical consequence of domain constraints or key constraints (i.e., all possible values are a result of an imposed constraint) – There is no proven method of converting a design to DKNF, so it remains an ideal rather than a state that can readily be achieved Ricardo, 1990 R. Ching, Ph.D. • MIS • California State University, Sacramento 40 DKNF 41 For example: Emp_ID, Emp_name, Classification, Position, Salary Domain for Classification: • Executive • Manager • Staff R. Ching, Ph.D. • MIS • California State University, Sacramento Domain for Position • Strategic Planner • CIO • Vice President Domain for Position • Programmer/Analyst I • Programmer/Analyst II • Database/Analyst I Database Design Methodology Conceptual database design • Build conceptual representation of the database Logical database design • Translate conceptual representation to logical structure of the database Physical database design • Operatioanlize logical structure in a physical implementation R. Ching, Ph.D. • MIS • California State University, Sacramento 42 Conceptual Database Design • The process of constructing a model of the data used in an enterprise, independent of all physical considerations • What’s involved… – – – – – – – – Identify entity types, relationship types Identify and associate attributes with entity or relationship types Determine attribute domains Determine candidate, primary and alternate key attributes Consider use of enhanced modeling concepts Check model for redundancies Validate conceptual model against user transactions Review conceptual data model with the users R. Ching, Ph.D. • MIS • California State University, Sacramento 43 Logical Database Design • The process of constructing a model of the data used in an enterprise based on a specific data model, but independent of a particular DBMS and other physical considerations • What’s involved… – – – – – – – Derive relations for logical data model Validate relations using data normalization Validate relations against user transactions Check integrity constraints Review logical data model (ERD) with the users Merge logical data models into global data model Check for future growth R. Ching, Ph.D. • MIS • California State University, Sacramento 44 Gather Information • Meet with the users to get gather information – Interviews – Documents R. Ching, Ph.D. • MIS • California State University, Sacramento 45 Derive Relations 46 Invoices have invoice items Invoice Items One-to-many relationship Have 1..1 Invoice number (pk) 1..* Product code (pk) Manufacture code Quantity Mandatory (all Sales Price Weak entity type (Invoice number is part of key) invoices must have at least one invoice item Invoice Invoice number (pk) Invoice date Delivery date Sales type Customer account Strong entity type • Strong and weak entity types • Relationship types (cardinality) • Participation (mandatory vs. partial) R. Ching, Ph.D. • MIS • California State University, Sacramento Validate Relations • Normalize relations • Validate against transactions - Can a transaction be recreated given the data retained in the relations?) • Check integrity constraints – Required data (not null) – Domain constraints (in, references) – Multiplicity – Entity integrity (primary key) – Referential integrity (foreign key) – General constraints (business rules) R. Ching, Ph.D. • MIS • California State University, Sacramento 47 Review Data Model with the Users • Be pleasant and professional, not arrogant, challenging or condescending – Not everyone is receptive to change – Your role is to facilitate change • The user is always “right” – It’s his/her data • Document all change requests (CYA) • Listen, listen, listen… (Even if you don’t agree) R. Ching, Ph.D. • MIS • California State University, Sacramento 48 Logical Global Data Model 49 Invoice Records Transactions Inventory Counts and retail prices Local Data Models Cust Accounts Customer credit accounts Cust Billing Global Data Model Customer credit sales Vendor History Vendor performance Product Sales Sales history R. Ching, Ph.D. • MIS • California State University, Sacramento Local data models are merged to create a (near) normalized global data model Physical Database Design • The process of producing a description of the implementation of the database on secondary storage • It describes the base relations, fle organizations and indexes used to achieve efficient access to the data and nay associated integrity constraints and security measures • What’s involved… – Translate logical data model for target DBMS: Design base relations, representation of derived data and general constraints – Design file organizations and indexes: Analyze transactions, choose file organizations, choose indexes, Dictated by the DB product estimate disk space requirements – Design user views and security mechanisms – Consider the introduction of controlled redundancy – Monitor and tune the operational system R. Ching, Ph.D. • MIS • California State University, Sacramento 50 Logical vs. Physical Database Design • Logical The process of constructing a model of the information use the enterprise based on one model of data, BUT independent of a particular DBMS and other physical aspects. • Physical The process of producing a description of the implementation of the database on secondary storage; it describes the storage structures and access methods used to gain access effectively. Whereas the logical database design is concerned with the what, physical database design is concerned with the how. R. Ching, Ph.D. • MIS • California State University, Sacramento 51 Physical Database Design Five steps: • Translate the global (enterprise) logical data model for the target DBMS • Design files organizations and indexes, estimate database space (disk space requirements) • Design and implement user views and security mechanisms • Consider the introduction of controlled redundancy (denormalization) • Monitor and tune the operational system R. Ching, Ph.D. • MIS • California State University, Sacramento 52 Translate the Global Logical Database Model for the Target DBMS • Design the relations for the target DBMS – Decide how to represent the base relations in the global logical data model in the target DBMS • Specify keys (primary, foreign), default values, integrity constraints (table, column), and indexes • Design integrity rules for the target DBMS – Design the enterprise constraints for the target DBMS • Applies to updates and inserts R. Ching, Ph.D. • MIS • California State University, Sacramento 53 Design and Implement the Physical Representation • Determine the file organizations and access methods that will be used to store the base relations (i.e., the way in which relations and tuples will be held in secondary storage) Depends on the vendor! – Understand the system resources • Understand the capabilities of the hardware (CPU, memory, disk I-O) • Analyze the software’s performance and limitations on the network (client/server) and Internet R. Ching, Ph.D. • MIS • California State University, Sacramento 54 Design and Implement the Physical Representation • Analyze the transactions - understand the functionality of the transactions that will run on the database, and analyze the import transactions • Choose file organization • Choose secondary indexes - determine whether secondary indexes will enhance performance – Index the primary key (if it is not the key of the file organization) – Do not index small relations – Add a secondary index to a heavily used secondary key – Add a secondary index to a frequently used foreign key R. Ching, Ph.D. • MIS • California State University, Sacramento 55 Design and Implement the Physical Representation (cont.) – AVOID INDEXING AN ATTRIBUTE OR RELATION THAT IS FREQUENTLY UPDATED – Avoid indexing an attribute if the query will retrieve a large portion of the tuples in a relation – Avoid indexing attributes that consist of long character strings R. Ching, Ph.D. • MIS • California State University, Sacramento 56 Design and Implement the Physical Representation • Consider the introduction of controlled redundancy – Determine whether introducing redundancy in a controlled manner by relaxing the normalization rules will enhance performance • Denormalize only when necessary – However, denormalizing » Makes implementation more complex » Sacrifices flexibility » May slow down updates (although retrievals may be increased) R. Ching, Ph.D. • MIS • California State University, Sacramento 57 3NF (Logical Database Design) 58 Invoice Relation Invoice_number Invoice_date Date_delivered Cust_account Customer Relation Cust_account Cust_name Cust_addr Zip_code Zip_code Relation Zip_code City State Invoice_items Relation Invoice_number Item Item_qty Item_price Items Relation Manufacturers Relation Item Item_descrip Manuf_code Manuf_code R. Ching, Ph.D. • MIS • California State University, Sacramento Manuf_name Denormalization 59 • Duplicating attributes or combining relations – Combining 1:1 relationships Customers Relation Cust_account Cust_name Cust_addr Zip_code Customer_accounts Relation Cust_account Account_type Credit_limit Current_balance Pay_history Customers Relation Cust_account Cust_name Cust_addr Zip_code Credit_limit Current_balance Pay_history R. Ching, Ph.D. • MIS • California State University, Sacramento Account_type Denomalization 60 • Duplicating attributes or combining relations – Duplicating nonkey attributes in 1:M relationships to reduce joins (creating partial or transitive dependencies) Customers Relation Cust_account Cust_name Cust_addr Zip_code Credit_limit Current_balance Pay_history Account_type Zip_codes Relation Zip_code City State Customers Relation Cust_account Cust_name Cust_addr City Account_type Credit_limit Current_balance R. Ching, Ph.D. • MIS • California State University, Sacramento State Zip_code Pay_history Denomalization 61 (cont.) – Reference tables (introducing transitive dependencies) Invoice_items Relation Invoice_number Item Item_qty Item_price Items Relation Manufacturers Relation Item Item_descrip Manuf_code Manuf_code Manuf_name Problem: In order to know the manufacturer’s name of a customer’s purchased item, a join between Items and Manufacturers must be performed R. Ching, Ph.D. • MIS • California State University, Sacramento Denomalization 62 (cont.) – Reference tables (introducing transitive dependencies) Invoice_items Relation Invoice_number Item Item_qty Item_price Items Relation Manufacturers Relation Item Item_descrip Manuf_code Manuf_code Manuf_name Invoice_number Item Manuf_code Item_price Manuf_name Item_qty R. Ching, Ph.D. • MIS • California State University, Sacramento Denomalization 63 (cont.) – Duplicating foreign key attributes in 1:M relationships to reduce joins Invoice_items Relation Invoice_number Item Item_qty Item_price Items Relation Manufacturers Relation Item Item_descrip Manuf_code Manuf_code Manuf_name Problem: To find the manufacturer’s name of a product (e.g., Sony CDP-525) from line_items (relation), two joins must be made: manufacturers to products, and products to manufacturers. R. Ching, Ph.D. • MIS • California State University, Sacramento Denomalization 64 (cont.) Invoice_items Relation Invoice_number Item Item_qty Item_price Items Relation Manufacturers Relation Item Item_descrip Manuf_code Manuf_code Invoice_number Item Manuf_code R. Ching, Ph.D. • MIS • California State University, Sacramento Manuf_name Item_qty Item_price Denomalization 65 (cont.) – Duplicating attributes in M:N relationships to reduce joins If joint accounts are allowed and different types of accounts (i.e., long term, revolving) are available: Customers Relation Cust_account Cust_name Cust_addr Zip_code Soc_Sec_Num M:N Customer_accounts Relation Cust_account Account_type Credit_limit Current_balance Pay_history Soc_Sec_Num R. Ching, Ph.D. • MIS • California State University, Sacramento Denormalization 66 123456789 John Smith 123456789 Jane Smith 112233445 John Doe … ... … 123-45-6789 987-65-4321 567-32-1234 A customer can have several accounts... 123456789 123456789 543219876 678901234 548794133 … … … … … 123-45-6789 987-65-7321 123-45-6789 987-65-7321 567-32-1234 R. Ching, Ph.D. • MIS • California State University, Sacramento An account can have several owners... Denormalization 67 (cont.) Customers Relation Cust_account Cust_name Cust_addr Zip_code Soc_Sec_Num M:N Customer_accounts Relation Cust_account Account_type Credit_limit Current_balance Pay_history Soc_Sec_Num R. Ching, Ph.D. • MIS • California State University, Sacramento Denormalization 68 (cont.) • Duplicating attributes in M:N relationships to reduce joins Customers Relation Cust_account Cust_name Cust_addr Zip_code Soc_Sec_Num Customer_accounts Relation Cust_account Account_type Credit_limit Current_balance Pay_history Soc_Sec_Num Cust_account Account_type Credit_limit Current_balance Pay_history Soc_Sec_Num Cust_name R. Ching, Ph.D. • MIS • California State University, Sacramento Denomalization 69 (cont.) – Introducing repeating groups (if the number of occurrences is known and/or constant) – Creating extract tables (in an extreme case, an unnormalized relation) - frees computing resources Cust_account Account_type Credit_limit Current_balance Pay_history Soc_Sec_Num1 Cust_name1 Soc_Sec_num2 Cust_name2 R. Ching, Ph.D. • MIS • California State University, Sacramento Denomalization 70 (cont.) – Introduction of “codes” to • Simplify the composite key • Retain the original sequence Invoice_items Relation Invoice_number Item Manuf_code Item_qty Item_price Invoice_number Item_number Item Manuf_code Item_qty Item_price R. Ching, Ph.D. • MIS • California State University, Sacramento Invoice No. 0 0 0 1 O r d e r N o . 1 1 5 9 3 D a t e : 6 / / A c c o u n t N o . 71 S t e r e o s T o G o 1 0 0 5 2 5 9 4 1 2 i l l i a m T e l l C u s t o m e r :W A d d r e s s : 2 0 3 6 2 6 S t r e e t S a c r a m e n t o C A 9 5 8 1 9 C i t y S t a t eZ i p C o d e 1 8 9 3 D a t e S h i p p e d : 6 / / r o d u c t I t e m P C o d e N u m b e r P r o d u c t D e s c r i p t i o n / M a n u f a c t u r e r r i c e Q t y P S A G X 7 3 0 P i o n e e r R e m o t e A / V R e c e i v e r1 5 6 9 9 5 2A T 1 0 C e r v w i n V e g a L o u d s p e a k e r s 1 3 5 9 9 5 3C D P C 7 2 5 S o n y D i s c J o c k e y C D C h a n g e r1 3 9 9 9 5 1 4 5 R. Ching, Ph.D. • MIS • California State University, Sacramento Denormalization 10001 AT10 10001 CDPC725 10001 SAGX730 CV Loudspeakers 2 SON Disc-Jockey CD Changer 1 PIO Remote A/V Receiver 1 72 359.95 399.95 569.95 Key Problem: These items are not in the sequence as they appear on the original document when retrieved from the table. R. Ching, Ph.D. • MIS • California State University, Sacramento Denormalization 10001 01 SAGX730 PIO Remote A/V Receiver 1 10001 02 AT10 CV Loudspeakers 2 10001 03 CDPC725 SON Disc-Jockey CD Changer 1 Key R. Ching, Ph.D. • MIS • California State University, Sacramento 73 569.95 359.95 399.95 Denomalization 74 (cont.) – Introducing calculated attributes • Simplify processing Invoice_items Relation Invoice_number Item Manuf_code Item_qty Item_price Invoice_number Item_number Item Manuf_code Item_qty Item_price Extended_price Item_qty x Item_price R. Ching, Ph.D. • MIS • California State University, Sacramento Denormalization 10001 01 SAGX730 PIO Remote A/V Receiver 1 10001 02 AT10 CV Loudspeakers 2 10001 03 CDPC725 SON Disc-Jockey CD Changer 1 75 569.95 569.95 359.95 719.90 399.95 399.95 Calculation R. Ching, Ph.D. • MIS • California State University, Sacramento 76 R. Ching, Ph.D. • MIS • California State University, Sacramento