Database Design Introduction The chapter will address the following questions: What are the similarities and differences between conventional files and modern, relational databases? What are of fields, records, files, and databases? What are some examples of each? What is a modern data architecture that includes files, operational databases, data warehouses, personal databases, and work group databases? What are the similarities and differences between the roles of systems analyst, data administrator, and database administrators as they relate to databases? What is the architecture of a database management system? Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 1 Copyright Irwin/McGraw-Hill 1998 Database Design Introduction The chapter will address the following questions: How does a relational database implement entities, attributes, and relationships from a logical data model? How do you normalize a logical data model to remove impurities that can make a database unstable, inflexible, and non-scaleable? How do you transform a logical data model into a physical, relational database schema? How do you generate SQL code to create the database structures in a schema? Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 2 Copyright Irwin/McGraw-Hill 1998 Database Design Conventional Files Versus the Database Introduction All information systems create, read, update and delete data. This data is stored in files and databases. Files are collections of similar records. Databases are collections of interrelated files. • The key word is interrelated. • The records in each file must allow for relationships (think of them as ‘pointers’) to the records in other files. In the file environment, data storage is built around the applications that will use the files. In the database environment, applications will be built around the integrated database. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 3 Copyright Irwin/McGraw-Hill 1998 Database Design Information System File File Information System File Information System Database (consolidated & integrated data from files) Information System File Information System Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 4 Copyright Irwin/McGraw-Hill 1998 Database Design Conventional Files Versus the Database The Pros and Cons of Conventional Files Pros: Conventional files are relatively easy to design and implement because they are normally based on a single application or information system. Historically, another advantage of conventional files has been processing speed. Cons: Duplication of data items in multiple files is normally cited as the principal disadvantage of file-based systems. A significant disadvantage of files is their inflexibility and nonscaleability. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 5 Copyright Irwin/McGraw-Hill 1998 Database Design Conventional Files Versus the Database The Pros and Cons of Conventional Files As legacy file-based systems and applications become candidates for reengineering, the trend is overwhelmingly in favor of replacing file-based systems and applications with database systems and applications. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 6 Copyright Irwin/McGraw-Hill 1998 Database Design Conventional Files Versus the Database The Pros and Cons of Database Pros: The principal advantage of a database is the ability to share the same data across multiple applications and systems. Database technology offers the advantage of storing data in flexible formats. Databases allow the use of the data in ways not originally specified by the end-users - data independence. The database scope can even be extended without impacting existing programs that use it. • New fields and record types can be added to the database without affecting current programs. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 7 Copyright Irwin/McGraw-Hill 1998 Database Design Conventional Files Versus the Database The Pros and Cons of Database Cons: Database technology is more complex than file technology. • Special software, called a database management system (DBMS), is required. A DBMS is still somewhat slower than file technology. Database technology requires a significant investment. • The cost of developing databases is higher because analysts and programmers must learn how to use the DBMS. In order to achieve the benefits of database technology, analysts and database specialists must adhere to rigorous design principles. Another potential problem with the database approach is the Prepared by Kevin C. Dittman for increased vulnerability inherent in the use of shared data. Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 8 Copyright Irwin/McGraw-Hill 1998 Database Design Conventional Files Versus the Database Database Design in Perspective To fully exploit the advantages of database technology, a database must be carefully designed. The end product is called a database schema, a technical blueprint of the database. Database design translates the data models that were developed for the system users during the definition phase, into data structures supported by the chosen database technology. Subsequent to database design, system builders will construct those data structures using the language and tools of the chosen database technology. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 9 Copyright Irwin/McGraw-Hill 1998 Database Design INFORMATION SYSTEMS FRAMEWORK FOCUS ON SYSTEM DATA FOCUS ON SYSTEM PROCESSES FOCUS ON SYSTEM INTERFACES FOCUS ON SYSTEM GEOGRAPHY Business Subjects SYSTEM OWNERS (scope) Survey Phase (establish scope and project plan) Custom ers order zero, one, or m ore products. Products m ay be ordered by zero, one, or m ore custom ers. Study Phase (establish system improvement objectives) Data Requirements S Y S T E M A N A L Y S T S SYSTEM USERS (requirements) PRODUCT product-no product-name unit-of-measure unit-price quantity-available CUSTOMER customer-no customer-name customer-rating balance-due ORDER order-no order-date products-ordered quantities-ordered Definition Phase (establish and prioritize business system requirements) data models Database Schema SYSTEM DESIGNERS (specification) PRODUCT CUSTOMER product_no [Alpha(10)] INDEX customer_no [Alpha (10)] INDEX product_name [Alpha(32)] customer_name [Alpha(32)] unit_of_measure [Alpha(2)] customer_rating [Alpha(1)] INDEX unit_pri ce [Real(3,2)] balance_due [Real(5,2)] quantity_available [Integ er(4)] ORDER_PRODUCT ORDER order_no [Alpha(12)] INDEX ORDER. order_no order_date [Date(mmddyyyy) PRODUCT.product_no quantity_ordered [Integ er(2) CUSTOMER.customer_n o Design Phase (translate business requirements into a technical design) Database Programs SYSTEM BUILDERS (components) CREATE TABLE CUSTOMER (customer_no CHAR(10) NOT NULL customer_name CHAR(32) NOT N ULL customer _rating CHAR (1) NOT NU LL balance_due DECIMAL(5,2) CREATE INDEX cust_no_idx on CUSTOMER CREATE INDEX cust_rt_idx on CUSTOMER Existing Databases and Technology Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley FAST Methodology Existing Interfaces and Technology Existing Applications and Technology 10 Implementation Phase (translate technical design into code) Existing Networks and Technology Copyright Irwin/McGraw-Hill 1998 Database Design Database Concepts for the Systems Analyst Fields Fields are common to both files and databases. A field is the implementation of a data attribute. • Fields are the smallest unit of meaningful data to be stored in a file or database. There are four types of fields that can be stored: primary keys, secondary keys, foreign keys, and descriptive fields. Primary keys are fields whose values identify one and only one record in a file. Secondary keys are alternate identifiers for an database. • A single file in a database may only have one primary key, but it may have several secondary keys. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 11 Copyright Irwin/McGraw-Hill 1998 Database Design Database Concepts for the Systems Analyst Fields There are four types of fields that can be stored: primary keys, secondary keys, foreign keys, and descriptive fields. (continued) Foreign keys are pointers to the records of a different file in a database. • Foreign keys are how the database ‘links’ the records of one type to those of another type. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley Descriptive fields are any other fields that store business data. 12 Copyright Irwin/McGraw-Hill 1998 Database Design Database Concepts for the Systems Analyst Records Fields are organized into records. Like fields, records are common to both files and databases. A record is a collection of fields arranged in a predefined format. During systems design, records will be classified as either fixedlength or variable-length records. Most database systems impose a fixed-length record structure, meaning that each record instance has the same fields, same number of fields, and same logical size. Variable-length record structures allow different records in the same file to have different lengths. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley • Database systems typically disallow (or, at least, discourage) variable length records. 13 Copyright Irwin/McGraw-Hill 1998 Database Design Database Concepts for the Systems Analyst Records When a computer program ‘reads’ a record from a database, it actually retrieves a group or block of records at a time. This approach minimizes the number of actual disk accesses. A blocking factor is the number of logical records included in a single read or write operation (from the computer’s perspective). A block is sometimes called a physical record. Today, the blocking factor is usually determined and optimized by the chosen database technology, but a qualified database expert may be allowed to fine tune that blocking factor for performance. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 14 Copyright Irwin/McGraw-Hill 1998 Database Design Database Concepts for the Systems Analyst Files and Tables Similar records are organized into groups called files. A file is the set of all occurrences of a given record structure. In database systems, a file corresponds to a set of similar records; usually called a table. A table is the relational database equivalent of a file. Some of the types of files and tables include: Master files or tables contain records that are relatively permanent. • Once a record has been added to a master file, it remains in the system indefinitely. • The values of fields for the record will change over its lifetime, but the individual records are retained indefinitely. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 15 Copyright Irwin/McGraw-Hill 1998 Database Design Database Concepts for the Systems Analyst Files and Tables Some of the types of files and tables include: (continued) Transaction files or tables contain records that describe business events. • The data describing these events normally has a limited useful lifetime. • In information systems, transaction records are frequently retained on-line for some period of time. • Subsequent to their useful lifetime, they are archived off-line. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley Document files and tables contain stored copies of historical data for easy retrieval and review without the overhead of regenerating the document. 16 Copyright Irwin/McGraw-Hill 1998 Database Design Database Concepts for the Systems Analyst Files and Tables Some of the types of files and tables include: (continued) Archival files and tables contain master and transaction file records that have been deleted from on-line storage. • Records are rarely deleted; they are merely moved from on-line storage to off-line storage. • Archival requirements are dictated by government regulation and the need for subsequent audit or analysis. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley Table look-up files contain relatively static data that can be shared by applications to maintain consistency and improve performance. 17 Copyright Irwin/McGraw-Hill 1998 Database Design Database Concepts for the Systems Analyst Files and Tables Some of the types of files and tables include: (continued) Audit files are special records of updates to other files, especially master and transaction files. • They are used in conjunction with archive files to recover ``lost’’ data. • Audit trails are typically built into better database technologies. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 18 Copyright Irwin/McGraw-Hill 1998 Database Design Database Concepts for the Systems Analyst Databases Databases provide for the technical implementation of entities and relationships. The history of information systems has led to one inescapable conclusion: Data is a resource that must be controlled and managed! Out of necessity, database technology was created so an organization could maintain and use its data as an integrated whole instead of as separate data files. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 19 Copyright Irwin/McGraw-Hill 1998 Database Design Database Concepts for the Systems Analyst Databases Data Architecture: A business’ data architecture is comprised of the files and databases that store all of the organization’s data, the file and database technology used to store the data, and the organization structure set up to manage the data resource. Operational databases have been developed to support day-today operations and business transaction processing for major information systems. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 20 Copyright Irwin/McGraw-Hill 1998 Database Design Database Concepts for the Systems Analyst Databases Data Architecture: Many information systems shops hesitate to give end-users access to operational databases, because the volume of unscheduled reports and queries could overload the computers and hamper business operations. • To remedy that problem, data warehouses were developed. computers. – Data warehouses store data that is extracted from the production databases and conventional files. – Fourth-generation programming languages, query tools, and decision support tools are then used to generate reports and analyses off these data warehouses. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 21 Copyright Irwin/McGraw-Hill 1998 Database Design Database Concepts for the Systems Analyst Databases Data Architecture: Personal computer and local network database technology has rapidly matured to allow end-users to develop personal and departmental databases. • These databases may contain unique data, or they may import data from conventional files, operational databases, and/or data warehouses. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 22 Copyright Irwin/McGraw-Hill 1998 Database Design Database Concepts for the Systems Analyst Databases Data Architecture: To manage the enterprise-wide data resource, a staff of database specialists may be organized around the following administrators: • A data administrator is responsible for the data planning, definition, architecture, and management. – One or more database administrators are responsible for the database technology, database design and construction, security, backup and recovery, and performance tuning. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 23 Copyright Irwin/McGraw-Hill 1998 Database Design Users and Programmers Information System File A legacy file-based information system Users and Programmers File (built in-house) Information System Information System Operational Database (built in-house) (built in-house) (built in-house) File End-User Tools File Data Warehouse End-User Applications File A legacy file-based information system Users and Programmers Users Personal DB File (purchased) Operational Database File Information System (purchased) Work-Group Database End-User Work Group Users and Programmers Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 24 Copyright Irwin/McGraw-Hill 1998 Database Design Database Concepts for the Systems Analyst Databases Database Architecture: Database architecture refers to the database technology including the database engine, database management utilities, database CASE tools for analysis and design, and database application development tools. The control center of a database architecture is its database management system. • A database management system (DBMS) is specialized computer software available from computer vendors that is used to create, access, control, and manage the database. The core of the DBMS is often called its database engine. The engine responds to specific commands to create database structures, and then to create, read, update, and delete records in the database. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 25 Copyright Irwin/McGraw-Hill 1998 Database Design Database Concepts for the Systems Analyst Databases Database Architecture: A systems analyst, or database analyst, designs the structure of the data in terms of record types, fields contained in those record types, and relationships that exist between record types. These structures are defined to the database management system using its data definition language. • Data definition language (or DDL) is used by the DBMS to physically establish those record types, fields, and structural relationships. Additionally, the DDL defines views of the database. Views restrict the portion of a database that may be used or accessed by different users and programs. DDLs record the definitions in a permanent data repository. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 26 Copyright Irwin/McGraw-Hill 1998 Database Design Programmers Systems Analysts and/or Database Designers End Users Host-based Transaction Processing Monitor (optional) Internal TP Monitor (opt) Data Manipulation Language DML Data Definition Language DDL Proprietary Data Manipulation Language and/or Report Writers Database Management System (DBMS) Stored Data Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 27 Metadata Copyright Irwin/McGraw-Hill 1998 Database Design Database Concepts for the Systems Analyst Databases Database Architecture: Some data dictionaries include formal, elaborate software that helps database specialists track metadata – the data about the data –such as record and field definitions, synonyms, data relationships, validation rules, help messages, and so forth. The database management system also provides a data manipulation language to access and use the database in applications. • A data manipulation language (or DML) is used to create, read, update, and delete records in the database, and to navigate between different records and types of records. The DBMS and DML hide the details concerning how records are organized and allocated to the disk. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 28 Copyright Irwin/McGraw-Hill 1998 Database Design Database Concepts for the Systems Analyst Databases Database Architecture: Many DBMSs don’t require the use of a DDL to construct the database, or a DML to access the database. • They provide their own tools and commands to perform those tasks. This is especially true of PC-based DBMSs. Many DBMSs also include proprietary report writing and inquiry tools to allow users to access and format data without directly using the DML. Some DBMSs include a transaction processing monitor (or TP monitor) that manages on-line accesses to the database, and ensures that transactions that impact multiple tables are fully processed as a single unit. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 29 Copyright Irwin/McGraw-Hill 1998 Database Design Database Concepts for the Systems Analyst Databases Relational Database Management Systems: There are several types of database management systems and they can be classified according to the way they structure records. Early database management systems organized records in hierarchies or networks implemented with indexes and linked lists. Relational databases implement data in a series of tables that are ‘related’ to one another via foreign keys. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley • Files are seen as simple two-dimensional tables, also known as relations. • The rows are records. • The columns correspond to fields. 30 Copyright Irwin/McGraw-Hill 1998 Database Design Customer Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley places Order sells 31 Ordered Product sold on Product Copyright Irwin/McGraw-Hill 1998 Database Design Customers Table Customer Number Customer Name 10112 10113 10114 10117 Luck Star Pemrose Hartman K-Jack Industries Customer Balance … 1455.77 12.14 0.00 - 20.00 Orders Table Order Number Customer Number (foreign key) A633 A634 A635 10112 10114 10112 … Ordered Products Table Order Number (foreign key) Product Number (foreign key) Quantity Ordered A633 A633 A634 A634 A635 A635 77F02 77B12 77B13 77F01 77B12 77B15 1 500 100 5 300 15 … Products Table Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley Product Number Product Description Quantity in Stock 77B12 77B13 77B15 77F01 77F02 Widget Widget Widget Gadget Gadget 8000 0 52 20 2 32 … Copyright Irwin/McGraw-Hill 1998 Database Design Database Concepts for the Systems Analyst Databases Relational Database Management Systems: Both the DDL and DML of most relational databases is called SQL (which stands for Structured Query Language). • SQL supports not only queries, but complete database creation and maintenance. • A fundamental characteristic of relational SQL is that commands return ‘a set’ of records, not necessarily just a single record (as in non-relational database and file technology). Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 33 Copyright Irwin/McGraw-Hill 1998 Database Design Database Concepts for the Systems Analyst Databases Relational Database Management Systems: High-end relational databases also extend the SQL language to support triggers and stored procedures. • Triggers are programs embedded within a table that are automatically invoked by a updates to another table. • Stored procedures are programs embedded within a table that can be called from an application program. Both triggers and stored procedures are reusable because they are stored with the tables themselves. • This eliminates the need for application programmers to create the equivalent logic within each application that use the tables. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 34 Copyright Irwin/McGraw-Hill 1998 Database Design Data Analysis for Database Design What is a Good Data Model? A good data model is simple. As a general rule, the data attributes that describe an entity should describe only that entity. A good data model is essentially non-redundant. This means that each data attribute, other than foreign keys, describes at most one entity. A good data model should be flexible and adaptable to future needs. We should make the data models as application-independent as possible to encourage database structures that can be extended or modified without impact to current programs. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 35 Copyright Irwin/McGraw-Hill 1998 Database Design Data Analysis for Database Design Data Analysis The technique used to improve a data model in preparation for database design is called data analysis. Data analysis is a process that prepares a data model for implementation as a simple, non-redundant, flexible, and adaptable database. The specific technique is called normalization. • Normalization is a technique that organizes data attributes such that they are grouped together to form stable, flexible, and adaptive entities. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 36 Copyright Irwin/McGraw-Hill 1998 Database Design Data Analysis for Database Design Data Analysis Normalization is a three-step technique that places the data model into first normal form, second normal form, and third normal form. An entity is in first normal form (1NF) if there are no attributes that can have more than one value for a single instance of the entity. An entity is in second normal form (2NF) if it is already in 1NF, and if the values of all non-primary key attributes are dependent on the full primary key – not just part of it. An entity is in third normal form (3NF) if it is already in 2NF, and if the values of its non-primary key attributes are not dependent on any other non-primary key attributes. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 37 Copyright Irwin/McGraw-Hill 1998 Database Design Data Analysis for Database Design Normalization Example First Normal Form: The first step in data analysis is to place each entity into 1NF. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 38 Copyright Irwin/McGraw-Hill 1998 Database Design sold PRODUCT ------------Key Data---------------Product-Number (PK1) Universal-Product-Code (PK2) --------Non-Key Data------------Quantity-in-Stock Product-Type Suggested-Retail-Price Club-Default-Unit-Price Current-Special-Unit-Price Current-Month-Units-Sold Current-Year-Units-Sold Total-Lifetime-Units-Sold MEMBER ORDER ------------------Key Data--------------------Order-Number (PK) ----------------Non-Key Data----------------Order-Creation-Date Order-Automatic-Fill-Date Member Number (FK1) Member-Name Member-Address Shipping-Address Shipping Instructions Club-Name (FK2) Promotion-Number (FK2) 0 { Ordered-Product-Description } n 0 { Ordered-Product-Title } n 1 { Quantity-Ordered } n 1 { Purchased-Unit-Price } n 1 { Extended-Price } n Order-Sub-Total-Cost Order-Sales-Tax Ship-Via-Method Shipping-Charge Order-Status Prepaid-Amount Method-of-Payment placed MEMBER ---------------------Key Data---------------------Member-Number (PK1) ------------------Non-Key Data------------------Member-Name Member-Status Member-Street-Address Member-Daytime-Phone-Number Date-of-Last-Order Member-Balance-Due Member-Bonus-Balance-Available Member-Credit-Card-Information 1 { Club-Name } n 1 { Agreement-Number } n 1 { Taste Code } n 1 { Media Preference } n 1 { Date-Enrolled } n 1 { Expiration-Date } n 1 { Number-of-Credits-Required } n 1 { Number of Credits-Earned } n enrolls in CLUB ------------------Key Data---------------------Club-Name (PK) --------------Non-Key Data-------------------Club-Description Club-Charter-Date 1 { Agreement-Number } n 1 { Agreement-Active-Date } n 1 { Agreement-Expiration-Date } n 1 { Obligation-Period } n 1 { Required-Number-of-Credits } n 1 { Bonus-Credits-After-Obligation } n sponsors is a generates MERCHANDISE -------------Key Data--------------Product-Number (PK1) Universal-Product-Code (PK1) ---------Non-Key Data-----------Merchandise-Name Merchandise-Description Merchandise-Size Merchasnise-Color Unit-of-Measure TITLE --------------Key Data-------------Product-Number (PK1) Universal-Product-Code (PK2) ----------Non-Key Data----------Title-of-Work Title-Cover Catalog-Description Copyright-Date Entertainment-Category Credit-Value features PROMOTION ---------Key Data------------Club-Name (PK1) Promotion-Number (PK1) -------Non-Key Data-------Product-Number (FK1) Promotion-Release-Date Promotion-Status Promotion-Type Automatic-Fill-Delay is a AUDIO TITLE -------------Key Data--------------Product-Number (PK1) Universal-Product-Code (PK1) ---------Non-Key Data-----------Artist Audio-Category Audio-Sub-Category Number-of-Units-in-Package Audio-Media-Code Content-Advisory-Code Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley VIDEO TITLE -------------Key Data--------------Product-Number (PK1) Universal-Product-Code (PK1) ---------Non-Key Data-----------Producer Director Video-Category Video-Sub-Category Closed-Captioned Language Running-Time Video-media-Type Video-Encoding Screen-Aspect MPA-Rating-Code GAME TITLE -------------Key Data--------------Product-Number (PK1) Universal-Product-Code (PK1) ---------Non-Key Data-----------Manufacturer Game-Category Game-Sub-Category Game-Platform Game-Media-Type Number-of-Players Parent-Advisory-Code 39 Copyright Irwin/McGraw-Hill 1998 Database Design MEMBER ORDER (1NF) ------------------Key Data--------------------Order-Number (PK) ----------------Non-Key Data----------------Order-Creation-Date Order-Automatic-Fill-Date Member Number (FK1) Member-Name Member-Address Shipping-Address Shipping Instructions Club-Name (FK2) Order-Sub-Total-Cost Order-Sales-Tax Ship-Via-Method Shipping-Charge Order-Status Prepaid-Amount MEMBER ORDER (unnormalized) ------------------KeyData--------------------Order-Number (PK) ---------------Non-Key Data----------------Order-Creation-Date Order-Automatic-Fill-Date Member Number (FK1) Member-Name Member-Address Shipping-Address Shipping Instructions Club-Name (FK2) Promotion-Number (FK2) 0 { Ordered-Product-Description } n 0 { Ordered-Product-Title } n 1 { Quantity-Ordered } n 1 { Purchased-Unit-Price } n 1 { Extended-Price } n Order-Sub-Total-Cost Order-Sales-Tax Ship-Via-Method Shipping-Charge Order-Status Prepaid-Amount Method-of-Payment sells CORRECTION MEMBER ORDERED PRODUCT (1NF) ---------------Key Data-----------------Member-Number (PK1) (FK) Product-Number (PK1) (FK) -------------Non-Key Data------------Ordered-Product-Description Ordered-Product-Title Quantity-Ordered Purchased-Unit-Price Extended-Price sold as PRODUCT (1NF) ------------Key Data---------------Product-Number (PK1) Universal-Product-Code (PK2) --------Non-Key Data------------Quantity-in-Stock Product-Type Suggested-Retail-Price Club-Default-Unit-Price Current-Special-Unit-Price Current-Month-Units-Sold Current-Year-Units-Sold Total-Lifetime-Units-Sold Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 40 Copyright Irwin/McGraw-Hill 1998 Database Design CLUB (1NF) ------------------Key Data---------------------Club-Name (PK) --------------Non-Key Data-------------------Club-Description Club-Charter-Date establishes CLUB (unnormalized) ------------------Key Data---------------------Club-Name (PK) --------------Non-Key Data-------------------Club-Description Club-Charter-Date 1 { Agreement-Number } n 1 { Agreement-Active-Date } n 1 { Agreement-Expiration-Date } n 1 { Obligation-Period } n 1 { Required-Number-of-Credits } n 1 { Bonus-Credits-After-Obligation } n Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley CORRECTION 41 AGREEMENT (1NF) ----------Key Data----------------Club-Name (PK1) (FK) Agreement-Number (PK1) --------Non-Key Data------------Agreement-Active-Date Agreement-Expiration-Date Obligation-Period Required-Number-of-Credits Bonus-Credits-After-Obligation Copyright Irwin/McGraw-Hill 1998 Database Design MEMBER (1NF) ---------------------Key Data---------------------Member-Number (PK1) ------------------Non-Key Data------------------Member-Name Member-Status Member-Street-Address Member-Daytime-Phone-Number Date-of-Last-Order Member-Balance-Due Member-Bonus-Balance-Available Member-Credit-Card-Information MEMBER (unnormalized) ---------------------Key Data---------------------Member-Number (PK1) ------------------Non-Key Data------------------Member-Name Member-Status Member-Address Member-Daytime-Phone-Number Date-of-Last-Order Member-Balance-Due Member-Bonus-Balance-Available Member-Credit-Card-Information 1 { Club-Name } n 1 { Agreement-Number } n 1 { Taste Code } n 1 { Media Preference } n 1 { Date-Enrolled } n 1 { Expiration-Date } n 1 { Number-of-Credits-Required } n 1 { Number of Credits-Earned } n enrolls in CLUB MEMBERSHIP (1NF) -------------Key Data-------------Member-Number (PK1) (FK) Club-Name (PK1) (FK) Agreement-Number (PK1) (FK) ---------Non-Key Data----------Taste Code Media Preference Date-Enrolled Expiration-Date Number-of-Credits-Required Number of Credits-Earned CORRECTION binds AGREEMENT (1NF) ----------Key Data----------------Club-Name (PK1) (FK) Agreement-Number (PK1) --------Non-Key Data------------Agreement-Active-Date Agreement-Expiration-Date Obligation-Period Required-Number-of-Credits Bonus-Credits-After-Obligation sponsors establishes Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley CLUB (1NF) ------------------Key Data---------------------Club-Name (PK) --------------Non-Key Data-------------------Club-Description Club-Charter-Date 42 Copyright Irwin/McGraw-Hill 1998 Database Design Data Analysis for Database Design Normalization Example Second Normal Form: The next step of data analysis is to place the entities into 2NF. • It is assumed that you have already placed all entities into 1NF. • 2NF looks for an anomaly called a partial dependency, meaning an attribute(s) whose value is determined by only part of the primary key. • Entities that have a single attribute primary key are already in 2NF. • Only those entities that have a concatenated key need to be checked. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 43 Copyright Irwin/McGraw-Hill 1998 Database Design MEMBER ORDERED PRODUCT (1NF) ---------------Key Data-----------------Member-Number (PK1) (FK) Product-Number (PK1) (FK) -------------Non-Key Data------------Ordered-Product-Description Ordered-Product-Title Quantity-Ordered Purchased-Unit-Price Extended-Price CORRECTION MEMBER ORDERED PRODUCT (2NF) ---------------Key Data-----------------Member-Number (PK1) (FK) Product-Number (PK1) (FK) -------------Non-Key Data------------Quantity-Ordered Purchased-Unit-Price Extended-Price sold as PRODUCT (2NF) ------------Key Data---------------Product-Number (PK1) Universal-Product-Code (PK2) --------Non-Key Data------------Quantity-in-Stock Product-Type Suggested-Retail-Price Club-Default-Unit-Price Current-Special-Unit-Price Current-Month-Units-Sold Current-Year-Units-Sold Total-Lifetime-Units-Sold is a MERCHANDISE (2NF) -------------Key Data--------------Product-Number (PK1) Universal-Product-Code (PK1) ---------Non-Key Data-----------Merchandise-Name Merchandise-Description Merchandise-Size Merchasnise-Color Unit-of-Measure Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 44 TITLE (2NF) --------------Key Data-------------Product-Number (PK1) Universal-Product-Code (PK2) ----------Non-Key Data----------Title-of-Work Title-Cover Catalog-Description Copyright-Date Entertainment-Category Credit-Value Copyright Irwin/McGraw-Hill 1998 Database Design Data Analysis for Database Design Normalization Example Third Normal Form: Entities are assumed to be in 2NF before beginning 3NF analysis. Third normal form analysis looks for two types of problems, derived data and transitive dependencies. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley • In both cases, the fundamental error is that non key attributes are dependent on other non key attributes. • Derived attributes are those whose values can either be calculated from other attributes, or derived through logic from the values of other attributes. • A transitive dependency exists when a non-key attribute is dependent on another non-key attribute (other than by derivation). • Transitive analysis is only performed on those entities that do not have a concatenated key. 45 Copyright Irwin/McGraw-Hill 1998 Database Design Data Analysis for Database Design Normalization Example Third Normal Form: Third normal form analysis looks for two types of problems, derived data and transitive dependencies. (continued) • A transitive dependency exists when a non-key attribute is dependent on another non-key attribute (other than by derivation). – This error usually indicates that an undiscovered entity is still embedded within the problem entity. • Transitive analysis is only performed on those entities that do not have a concatenated key. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley “An entity is said to be in third normal form if every nonprimary key attribute is dependent on the primary key, the whole primary key, and nothing but the primary key.” 46 Copyright Irwin/McGraw-Hill 1998 Database Design MEMBER ORDERED PRODUCT (2NF) ---------------Key Data-----------------Member-Number (PK1) (FK) Product-Number (PK1) (FK) -------------Non-Key Data------------Quantity-Ordered Purchased-Unit-Price Extended-Price Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley CORRECTION 47 MEMBER ORDERED PRODUCT (3NF) ---------------Key Data-----------------Member-Number (PK1) (FK) Product-Number (PK1) (FK) -------------Non-Key Data------------Quantity-Ordered Purchased-Unit-Price Extended-Price Copyright Irwin/McGraw-Hill 1998 Database Design MEMBER (3NF) ---------------------Key Data---------------------Member-Number (PK1) ------------------Non-Key Data------------------Member-Name Member-Status Member-Street-Address Member-Daytime-Phone-Number Date-of-Last-Order Member-Balance-Due Member-Bonus-Balance-Available Member-Credit-Card-Information placed MEMBER ORDER (2NF) ------------------Key Data--------------------Order-Number (PK) ----------------Non-Key Data----------------Order-Creation-Date Order-Automatic-Fill-Date Member Number (FK1) Member-Name Member-Address Shipping-Address Shipping Instructions Club-Name (FK2) Order-Sub-Total-Cost Order-Sales-Tax Ship-Via-Method Shipping-Charge Order-Status Prepaid-Amount Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley CORRECTION 48 MEMBER ORDER (3NF) ------------------Key Data--------------------Order-Number (PK) ----------------Non-Key Data----------------Order-Creation-Date Order-Automatic-Fill-Date Member Number (FK1) Member-Name Member-Address Shipping-Address Shipping Instructions Club-Name (FK2) Order-Sub-Total-Cost Order-Sales-Tax Ship-Via-Method Shipping-Charge Order-Status Prepaid-Amount Copyright Irwin/McGraw-Hill 1998 Database Design Data Analysis for Database Design Normalization Example Simplification by Inspection: When several analysts work on a common application, it is not unusual to create problems that won’t be taken care of by normalization. • These problems are best solved through simplification by inspection, a process wherein a data entity in 3NF is further simplified by such efforts as addressing subtle data redundancy. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 49 Copyright Irwin/McGraw-Hill 1998 Database Design Data Analysis for Database Design Normalization Example CASE Support for Normalization: Most CASE tools can only normalize to first normal form. • They accomplish this in one of two ways. – They look for many-to-many relationships and resolve those relationships into associative entities. – They look for attributes specifically described as having multiple values for a single entity instance. It is exceedingly difficult for a CASE tool to identify second and third normal form errors. • That would require the CASE tool to have the intelligence to recognize partial and transitive dependencies. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 50 Copyright Irwin/McGraw-Hill 1998 Database Design File Design Introduction. Most fundamental entities from the data model would be designed as master or transaction records. The master files a typically fixed length records. Associative entities from the data model are typically joined into the transaction records to form variable length records (based on the one-to-many relationships). Other types of files (not represented in the data model) are added as necessary. Two important considerations of file design are file access and organization. The systems analyst usually studies how each program will access the records in the file (‘sequentially’ or ‘randomly’), and Prepared by Kevin C. Dittman for then select an appropriate file organization. Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 51 Copyright Irwin/McGraw-Hill 1998 Database Design Database Design Introduction The design of any database will usually involve the DBA and database staff. They will handle the technical details and cross-application issues. It is useful for the systems analyst to understand the basic design principles for relational databases. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 52 Copyright Irwin/McGraw-Hill 1998 Database Design Database Design Goals and Prerequisites to Database Design The goals of database design are as follows: A database should provide for the efficient storage, update, and retrieval of data. A database should be reliable – the stored data should have high integrity to promote user trust in that data. A database should be adaptable and scaleable to new and unforeseen requirements and applications. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 53 Copyright Irwin/McGraw-Hill 1998 Database Design Database Design Goals and Prerequisites to Database Design The data model may have to be divided into multiple data models to reflect database distribution and database replication decisions. Data distribution refers to the distribution of either specific tables, records, and/or fields to different physical databases. Data replication refers to the duplication of specific tables, records, and/or fields to multiple physical databases. Each sub-model or view should reflect the data to be stored on a single server. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 54 Copyright Irwin/McGraw-Hill 1998 Database Design Database Design The Database Schema The design of a database is depicted as a special model called a database schema. A database schema is the physical model or blueprint for a database. It represents the technical implementation of the logical data model. A relational database schema defines the database structure in terms of tables, keys, indexes, and integrity rules. A database schema specifies details based on the capabilities, terminology, and constraints of the chosen database management system. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 55 Copyright Irwin/McGraw-Hill 1998 Database Design Database Design The Database Schema Transforming the logical data model into a physical relational database schema rules and guidelines: 1 Each fundamental, associative, and weak entity is implemented as a separate table. • The primary key is identified as such and implemented as an index into the table. • Each secondary key is implemented as its own index into the table. • Each foreign key will be implemented as such. • Attributes will be implemented with fields. – These fields correspond to columns in the table. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 56 Copyright Irwin/McGraw-Hill 1998 Database Design Database Design The Database Schema Transforming the logical data model into a physical relational database schema rules and guidelines: (continued) • The following technical details must usually be specified for each attribute. – Data type. Each DBMS supports different data types, and terms for those data types. – Size of the Field. Different DBMSs express precision of real numbers differently. – NULL or NOT NULL. Must the field have a value before the record can be committed to storage? – Domains. Many DBMSs can automatically edit data to ensure that fields contain legal data. – Default. Many DBMSs allow a default value to be automatically set in the event that a user or programmer submits a record without a value. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 57 Copyright Irwin/McGraw-Hill 1998 Database Design Database Design The Database Schema Transforming the logical data model into a physical relational database schema rules and guidelines: (continued) 2 Supertype/subtype entities present additional options as follows: • Most CASE tools do not currently support object-like constructs such as supertypes and subtypes. • Most CASE tools default to creating a separate table for each entity supertype and subtype. • If the subtypes are of similar size and data content, a database administrator may elect to collapse the subtypes into the supertype to create a single table. 3 Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley Evaluate and specify referential integrity constraints. 58 Copyright Irwin/McGraw-Hill 1998 Database Design Database Design Data and Referential Integrity There are at least three types of data integrity that must be designed into any database - key integrity, domain integrity and referential integrity. Key Integrity: Every table should have a primary key (which may be concatenated). • The primary key must be controlled such that no two records in the table have the same primary key value. • The primary key for a record must never be allowed to have a NULL value. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 59 Copyright Irwin/McGraw-Hill 1998 Database Design Database Design Data and Referential Integrity Domain Integrity: Appropriate controls must be designed to ensure that no field takes on a value that is outside of the range of legal values. Referential Integrity: A referential integrity error exists when a foreign key value in one table has no matching primary key value in the related table. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 60 Copyright Irwin/McGraw-Hill 1998 Database Design Database Design Data and Referential Integrity Referential Integrity: Referential integrity is specified in the form of deletion rules as follows: • No restriction. – Any record in the table may be deleted without regard to any records in any other tables. • Delete:Cascade. – A deletion of a record in the table must be automatically followed by the deletion of matching records in a related table. • Delete:Restrict. – A deletion of a record in the table must be disallowed until any matching records are deleted from a related table. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 61 Copyright Irwin/McGraw-Hill 1998 Database Design Database Design Data and Referential Integrity Referential Integrity: Referential integrity is specified in the form of deletion rules as follows: (continued) • Delete:Set Null. – A deletion of a record in the table must be automatically followed by setting any matching keys in a related table to the value NULL. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 62 Copyright Irwin/McGraw-Hill 1998 Database Design Database Design Roles Some database shops insist that no two fields have exactly the same name. This presents an obvious problem with foreign keys A role name is an alternate name for a foreign key that clearly distinguishes the purpose that the foreign key serves in the table. The decision to require role names or not is usually established by the data or database administrator. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 63 Copyright Irwin/McGraw-Hill 1998 Database Design Database Design Database Prototypes Prototyping is not an alternative to carefully thought out database schemas. On the other hand, once the schema is completed, a prototype database can usually be generated very quickly. Most modern DBMSs include powerful, menu-driven database generators that automatically create a DDL and generate a prototype database from that DDL. A database can then be loaded with test data that will prove useful for prototyping and testing outputs, inputs, screens, and other systems components. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 64 Copyright Irwin/McGraw-Hill 1998 Database Design Database Design Database Capacity Planning A database is stored on disk. The database administrator will want an estimate of disk capacity for the new database to ensure that sufficient disk space is available. Database capacity planning can be calculated with simple arithmetic as follows. 1 For each table, sum the field sizes. • This is the record size for the table. 2 For each table, multiply the record size times the number of entity instances to be included in the table. • This is the table size. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 65 Copyright Irwin/McGraw-Hill 1998 Database Design Database Design Database Capacity Planning Database capacity planning can be calculated with simple arithmetic as follows. (continued) 3 Sum the table sizes. • This is the database size. 4 Optionally, add a slack capacity buffer (e.g., 10%) to account for unanticipated factors or inaccurate estimates above. • This is the anticipated database capacity. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 66 Copyright Irwin/McGraw-Hill 1998 Database Design Database Design Database Structure Generation CASE tools are frequently capable of generating SQL code for the database directly from a CASE-based database schema. This code can be exported to the DBMS for compilation. Even a small database model can require 50 pages or more of SQL data definition language code to create the tables, indexes, keys, fields, and triggers. Clearly, a CASE tool’s ability to automatically generate syntactically correct code is an enormous productivity advantage. Furthermore, it almost always proves easier to modify the database schema and re-generate the code, than to maintain the code directly. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 67 Copyright Irwin/McGraw-Hill 1998 Database Design The Next Generation of Database Design Introduction Relational database technology is widely deployed and used in contemporary information system shops. One new technology is slowly emerging that could ultimately change the landscape dramatically – object database management systems. The heir apparent to relational DBMSs, object database management systems store true objects, that is, encapsulated data and all of the processes that can act on that data. Because relational database management systems are so widely used, we don’t expect this change to happen quickly. Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley • It is expected that these vendors will either build object technology into their existing relational DBMSs, or they will create new, object DBMSs and provide for the transition between relational and object models. Copyright Irwin/McGraw-Hill 1998 68 Database Design Summary Introduction Conventional Files Versus the Database Database Concepts for the Systems Analyst Data Analysis for Database Design File Design Database Design The Next Generation of Database Design Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley 69 Copyright Irwin/McGraw-Hill 1998