Database Management Systems & Programming LIS 558 - Week 2 Entity Relationship Modeling I Faculty of Information & Media Studies Summer 2000 Class Outline Database Models Database Design Problems Conceptual Design Methodology Break Purpose of Data Modeling Entity-Relationship Design E-R Terminology Entity-Relationship Method Examples Database Models A data model is the relatively simple representation, usually graphic, of complex real-world data structures. It represents data structures and their characteristics, relations, constraints, and transformations. The database designer usually employs data models as communications tools to facilitate the interaction among the designer, the applications programmer, and the end user. A good database is the foundation for good applications. Database Models Two Categories of Database Models • Conceptual models focus on the logical nature of the data representation. They are concerned with what is represented rather than how it is represented. • Implementation models place the emphasis on how the data are represented in the database or on how the data structures are implemented. Database Models Three Types of Implementation Database Models • Hierarchical database model • Network database model • Relational database model A Hierarchical Structure Database Models Hierarchical Database Model • Basic Structure – Collection of records logically organized to conform to the upside-down tree (hierarchical) structure. – The top layer is perceived as the parent of the segment directly beneath it. – The segments below other segments are the children of the segment above them. – A tree structure is represented as a hierarchical path on the computer’s storage media. Database Models Hierarchical Database Model • Advantages – – – – – Conceptual simplicity Database security Data independence Database integrity Efficiency dealing with a large database • Disadvantages – – – – – – Complex implementation Difficult to manage Lacks structural independence Applications programming and use complexity Implementation limitations Lack of standards Database Models Network Database Model • Basic Structure – Set -- A relationship is called a set. Each set is composed of at least two record types: an owner (parent) record and a member (child) record. – A set is represents a 1:M relationship between the owner and the member. A Network Database Model Database Models Network Database Model • Advantages – – – – – – Conceptual simplicity Handles more relationship types Data access flexibility Promotes database integrity Data independence Conformance to standards • Disadvantages – System complexity – Lack of structural independencem Evolution of Conceptual Data Models The Evolution of Conceptual Data Models Common characteristics required for data models: • A data model must show some degree of conceptual simplicity without compromising the semantic completeness. • A data model must represent the real world as closely as possible. • The representation of the real-world transformations (behavior) must be in compliance with the consistency and integrity characteristics of any data model. Database Models Relational database • Codd, E.F. (1970). A relational model for large shared data banks. CACM, 13(6), 377-87. –First relational prototype - IBM’s system/R –Other variants, e.g., INGRES - UC, Berkeley –1983 IBM released DB2 Relational databases required considerable computing resources (not feasible until mid- 1980s) –low end (Access, Paradox, dBase, FoxPro, Clipper) –high end (DB2, Oracle, Sybase, Informix, INGRES commercial) Database Models Relational Database defined: • Logical database model that treats data as if they are stored in separate two-dimensional but related tables • Each table consists of data elements describing a common theme among which is one (or more) elements that uniquely describes each record in the table • Tables are related as long as two tables share a common data element • Information in these tables cam be combined on an asneeded basis to get answers to queries and generate complex reports Database Models Advantages • • • • • • • But #1 problem is still Mechanisms for minimizing data redundancy and inconsistency! inconsistency Logical database design is separated from physical aspects Relatively program-data independent Management of data for access, manipulation, and security Flexible mechanisms for generating reports and queries Program development and maintenance costs are reduced Data can be accessed in a multiplicity of ways within and amongst organizations Disadvantages • Ease of use - many untrained people create and use databases without considering its design - usually incorporate many errors • Processing resources required The Relational Database Model Agents Clients Instruments Entertainers Engagements Entertainer styles • represented by tables (like spreadsheets) • tables are linked with software pointers • unlike earlier systems, all three types of relationships can be represented • accommodates the design of larger databases that involve complex relationships and intricate manipulations Linking Relational Tables Database Models Entity-Relationship Data Model • It is one of the most widely accepted graphical data modeling tools. • It graphically represents data as entities and their relationships in a database structure. • It complements the relational data model concepts. Database Models E-R model is commonly used to: • Translate different views of data among managers, users, and programmers to fit into a common framework. • Define data processing and constraint requirements to help us meet the different views. • Help implement the database. Database Models Entity Relationship Data Model • Basic Structure – E-R models are normally represented in an entity relationship diagram (ERD). – An entity is represented by a rectangle. – Each entity is described by a set of attributes. An attribute describes a particular characteristics of the entity. – A relationship is represented by a diamond connected to the related entities. Database Models Three Types of Relationships • One-to-many relationships (1:M) – A painter paints many different paintings, but each one of them is painted by only that painter. • PAINTER (1) paints PAINTING (M) • Many-to-many relationships (M:N) – An employee might learn many job skills, and each job skill might be learned by many employees. • EMPLOYEE (M) learns SKILL (N) • One-to-one relationships (1:1) – Each store is managed by a single employee and each store manager (employee) only manages a single store. • EMPLOYEE (1) manages STORE (1) Relationship Depiction: The ERD Database Models Entity-Relationship Data Model • Advantages – Exceptional conceptual simplicity – Visual representation – Effective communication tool – Integrated with the relational database model • Disadvantages – Limited constraint representation – Limited relationship representation – No data manipulation language – Loss of information content Introduction to Database Design Database schema defines database’s structure, tables, relationships, domains, and constraint rules Tables • BOOK (ISBN, Title, AuthID, PubID, Price) • PUBLISHER (PubID, PubName, PubPhone) • AUTHOR (AuthID, AuthName, AuthPhone) Relationships • Each book is published by one and only one publisher • Each publisher publishes one or more books Domains (set of values in a column) • Physical description (e.g., set of integers 0 < x < 99999) Constraints (business rules) • Price cannot be less than zero; Author phone field cannot be left blank Database Terminology Tables hold the data Database design is the process of separating information into multiple tables that are related to each other Single table designs work only for the simplest of situations Anomalies often arise in single table designs as a result of inserting, deleting, or updating records Table Users view their data in two-dimensional tables. table = file = relation Field The fields within records contain data. Data within a field must be of the same data type. Each field within a table must have a unique name. Order of fields is unimportant. column = field = attribute Record A record is a group of related fields of information about a single instance of one object or event in a database. Tables consist of zero, one, or more records. Order of rows is unimportant. row = record = tuple Data Dictionary and the System Catalog Data dictionary contains metadata to provide detailed accounting of all tables within the database. System catalog is a very detailed system data dictionary that describes all objects within the database. • System catalog is a system-created database whose tables store the database characteristics and contents. • System catalog tables can be queried just like any other tables. • System catalog automatically produces database documentation. Sample Data Dictionary Keys A key helps define entity relationships. The key’s role is based on a concept known as determination, which is used in the definition of functional dependence. – The attribute B is functionally dependent on A if A determines B. – An attribute that is part of a key is known as a key attribute. – A multi-attribute key is known as a composite key. – If the attribute (B) is functionally dependent on a composite key (A) but not on any subset of that composite key, the attribute (B) is fully functionally dependent on (A). Keys Controlled redundancy (shared common attributes) makes the relational database work. The primary key of one table appears again as the link (foreign key) in another table. If the foreign key contains either matching values or nulls, the table(s) that make use of such a foreign key are said to exhibit referential integrity. Indexes An index is composed of an index key and a set of pointers. Relational Database Keys Integrity Rules Database Components Tables Queries Forms Reports Modules Major Components of a Database Application 1. Form- data entry 3. Query- asks questions 2. Report- summarizes & prints 4. Menu - organizes components 5. Program - used to automate a database Levels of Database Representation Three levels of representation • external • conceptual • internal - user views of data - abstract description of data - physical implementation access methods, index construction, data structures Starting point for design? • Conceptual general description which is then represented in terms of the data contained in the database Conceptual level is primary focus of this course A Logical View of Data Relational database model’s structural and data independence enables us to view data logically rather than physically. The logical view allows a simpler file concept of data storage. The use of logically independent tables is easier to understand. Logical simplicity yields simpler and more effective database design methodologies. What is Relational Database Design? Relational Database Summary • logical database model that treats data as if they are stored in separate but related tables • Tables are related as long as two tables share common data element • Information in these tables can be combined on an as-needed basis to retrieve answers to queries and to generate complex reports What is Relational Database Design? Why use a Relational Data Model Design? Small databases can easily be maintained as a single flat file (like a spreadsheet) For design of larger databases that involve complex relationships and intricate manipulations, a relational model is essential Originally the major limitation of relational model was memory and processing speed but this is no longer the case Relational design model resolves a number of potential problems Database Design Problems Numerous anomalies can arise during the design of databases • • • • • Redundancy Multi-valued problems Update anomalies Insertion anomalies Deletion anomalies Database Design Problems Redundancy • unnecessary repetition of data Database Design Problems Multi-valued problems • e.g., 1 - Add multiple rows, one for each value •Data about a book must be repeated for as many times as there are authors of a book (also creates redundancy) Database Design Problems Multi-valued problems • e.g., 2 - Add multiple columns, one for each value •How many columns for authors must be included in the design (empty fields waste space too)? Database Design Problems Multi-valued problems • e.g., 3 - Include all author’s names in a single field •How do you search for a single author’s name or create an alphabetical list of authors Database Design Problems Update Anomalies • To update an author’s telephone, each instance must be changed • if we miss an item or enter it incorrectly we create an unreliable table • sometimes previous errors propagate errors further Database Design Problems Update Anomalies • e.g., Consider the author Austen in the following table. What happens if we change her telephone number? Propagation of Error Database Design Problems Insertion anomalies • What happens if we want to enter information regarding a publisher for whom we do not have book information? • Do we add null values for the other fields? Database Design Problems Deletion anomalies • What happens if we delete all the book entries for a given publisher? Database Design Problems Use of the relational database model removes some database anomalies Further removal of database anomalies relies on a structured technique called normalization Proper use of foreign keys is crucial to exercising data redundancy control Presence of some of these anomalies is sometimes justified in order to enhance performance Class Outline Database Models Database Design Problems Design Exercise Break Database Design Methodology Purpose of Data Modeling Entity-Relationship Design E-R Terminology Entity-Relationship Method Examples Characteristics of a Database designer Knowledge of the problem you are trying to solve Communication skills - extensive discussions with users Analytical aptitude - keep in mind the broad goals even while poring over the smallest details Impertinence - question everything! Impartiality - find best solution Relax constraints - assume anything is possible Pay attention to details and definitions Reframing - iteratively analyze in new way - be creative! A good designer combines the art of design with the science of design. Conceptual Design Methodology 1. Define the problem and define database objectives 2. Analyze current database, assess user requirements, and create data model 3. Design data structures (tables, fields, field specifications, establish keys) 4. Establish table relationships 5. Clarify business rules critical to database design (e.g., required fields, validation rules) 6. Determine and establish user views of data 7. Review data integrity and reiterate design methodology Statement of Purpose 1. Declare a specific purpose for the database to focus and guide its development e.g., “The purpose of the All-Star Talent database is to maintain the data we use in support of the entertainment services we provide to our clientele.” 2. Articulate goals & objectives that define specific tasks “We need to maintain complete entertainer information.” “We need to maintain complete customer information.” “We need to track all customer-entertainer bookings.” “We need to maintain financial records of both payments from customers and payments to entertainers.” Assessment of User Requirements: What is analyzed? interview transcripts meeting minutes observational notes • reports business mission and strategy statements • presentations questionnaire results document analyses • business forms • flow charts • computer-generated output • training manuals • consultant reports • job descriptions Assessment of User Requirements: Specific requirements Goals of analysis of user requirements: collect a list of business goals, entities to track, a database schema, and sample report outputs. What are subjects/objects for the business? What characteristics describe each object? What unique characteristic distinguishes each object from other objects of the same type? How do you use this data (e.g, summary reports)? Over what period of time are you interested in this data? Are all instances of each object the same? What events occur that imply associations between various objects? Is each activity or event always handled the same way or are there special circumstances? Rules for Conducting User Interviews Create a quiet, stress-free environment; set a limit of six people Have an agenda - provide it to participants ahead of time Focus on the problem at hand; maintain control of the interview Conduct separate interviews for users and management Identify the decision maker Avoid technical jargon Show concern for user needs Give everyone equal and undivided attention Write down everything where it can be seen by participants Encourage ‘blue sky’ thinking Arbitrate disputes Keep the pace of the interview moving Don’t foreclose your options too soon Data Modeling A model is a simplified representation (usually a graphic) of a complex object in reality to make it understandable If the elements in the model are correctly associated with elements in reality, the model can be used to solve problems in reality (e.g., engineer’s model to determine a bridge’s weight tolerance; if the model is incorrect...) an ER model is integrated set of concepts that describes data, relationships between data, and the constraints on the data as they are used within a specific organization; a data model renders organization’s (users’) view of objects and/or events and their associations ER model is a blueprint from which a well-structured database is created ER models are independent of details of implementation Purpose of Data Modeling High level description • integrated set of concepts that describes data, relationships between data, and the constraints on the data as they are used within a specific organization View of objects or events • data model renders organization’s (users’) view of objects and/or events and their associations Representation of data in a simplified fashion that makes it understandable Once established, database design is relatively straightforward E-R Modeling Concepts 1 : 1 Connectivity Entities M : N Relationship Type Participation Objects Cardinality Relationships Degree Recursive Binary Domains Attributes Values 1 : N Ternary N-ary Mandatory Optional Entities Entity • Something that can be identified in the users’ environment about which we want to store data; typically is a noun • Entities or objects must have occurrences that can be uniquely identified • Identified by an organization or its users • Consists of tangible or intangible objects or events Entity Instance • A single entity occurrence or instance within a collection of entities e.g., STUDENT is an entity; Annie Abel is an entity instance as are Bob Brown and Cathy Chen. STUDENT Attributes properties that describe characteristics of an entity - assumed all instances of a given entity have the same attributes • • • • use atomic attributes, those that cannot be divided further (e.g., not composite attributes (e.g., use last name & first name, not name) do not use derived attributes (attributes that can be calculated using other attributes; e.g., age) use single value attributes not multi-valued (e.g., medication1, medication2, etc.) multi-valued attributes, if they have their own important attributes should be elevated to entities e.g., attributes of the entity STUDENT might include name, address, etc. phone # first name last name STUDENT photo birth date Identifiers Each entity occurrence has a unique identifier The identifier is an attribute (or group of attributes) that describes or identifies each entity occurrence An identifier should be unique to each occurrence Referred to as a ‘primary key’ in relational models e.g., in the list of potential attributes of the entity STUDENT, the identifier could be Student Number. STUDENT StudentID, ... Relationships Association or connection between two or more entities • • • Usually a verb HAS-A is also a common relationship (EMPLOYEE-has a-DEPENDENT) E-R model also contains relationship classes STUDENT StudentID, ... takes COURSE CourseID, ... Degree of Relationship: Binary In a binary relationship, two entities are associated. This is the most common degree of relationship. VACATIONER EMPLOYEE takes works for TRIP DEPARTMENT Degree of Relationship: Ternary In a ternary relationship, three entities are associated. WRITER ILLUSTRATOR CUSTOMER WAREHOUSE create order DESIGNER ITEM Degree of Relationship: Unary (Recursive) In a recursive relationship, one entity is associated with itself. TEAM COURSE plays requires Recursive Relationships A relationship can exist between occurrences of the same entity set Implementation of the M:N Recursive “PART Contains PART” Relationship Implementation of the M:N “COURSE Requires COURSE” Recursive Relationship Implementation of the 1:M “EMPLOYEE Manages EMPLOYEE” Recursive Relationship Connectivity Connectivity describes constraints on relationship (also referred to as “maximum cardinality”) Number of instances of entity B that can (or must) be associated with each instance of entity A One-to-Many Child One-to-One Employee Many-to-Many Musician 1 rents M 1 has 1 M sings N Toy Office Song Representing M:N binary relationships M:N relationships are represented by two 1:M relationships. the relationship is itself an entity, called a composite entity (rectangle around the diamond) The composite entity often has its own attributes STUDENT STUDENT M 1 M Date N enrolls in enrolls in M CLASS 1 Mark CLASS Cardinality Cardinality is the specific number of entity occurrences associated with one occurrence of the related entity often referred to as ‘business rules’ because cardinality is usually determined by organizational policy e.g., at a toy lending library, a given child may not borrow any toys at all or they may borrow more than one (up to 3) toys. A toy may not be borrowed by anyone, or it may be borrowed by one child. Child 1 (0,3) borrows M (0,1) Toy Occurrences Diagram Pictorial mapping of the occurrences between two entities assists in understanding connectivity and cardinality C1 C2 C3 C4 C5 C6 T1 T2 T3 T4 T5 T6 A child may rent between 0 and 3 toys; a toy may only be rented by 0 or 1 child. One child may rent many toys (1:M) Relationship Participation Also referred to as “minimum cardinality” Mandatory Participation • An instance of a given entity must definitely match an instance of a second entity • e.g., each student must enroll in exactly one course Optional Participation • An instance of a given entity does not necessarily participate in the relationship • lower bound of cardinality is zero • e.g., a faculty member teaches zero, one, or two courses 1 N DONATION MEMBER makes MANDATORY (0,N) (1,1) OPTIONAL a member may or may not make a donation but a donation must be associated with a member Example: Customers & Orders From the CUSTOMER perspective: – a customer may make many orders (M orders of 1:M connectivity) – a customer does not necessarily make orders (optional participation of orders, cardinality is (0,N)) 1 CUSTOMER makes (0,N) M ORDER (1,1) From the ORDER perspective: – an order is made by (associated with) one and only one customer (1 customer of 1:M connectivity) – an order must be made by (associated with) a customer (mandatory participation, cardinality is (1,1)) Example: Customers & Orders common field parent table related table Entity-Relationship Design First step - develop a list of entities and attributes that are of interest to the enterprise Entities or objects must have occurrences that can be uniquely identified E-R Design consists of determining entities, attributes, relationships, relationship types, level of participation in relationships, identifiers, and then drawing an E-R Diagram E-R Diagram (model) is a blueprint from which a well-structured database is created Steps in Entity-Relationship Modeling 1. Identify entities 2. Identify relationships 3. Determine relationship type 4. Determine level of participation 5. Assign an identifier for each entity 6. Draw completed E-R diagram 7. Deduce a set of preliminary skeleton tables along with a proposed primary key for each table (using rules provided) 8. Develop a list of all attributes of interest (not already listed and systematically assign each to a table in such a way to achieve a 3NF design (i.e., no repeating groups, no partial dependencies, and no transitive dependencies) E-R Method Example: Library Database Step 1. Identify entity types AUTHOR BOOK PUBLISHER Step 2. Identify relationships AUTHOR writes BOOK PUBLISHER publishe s BOOK Library Database (cont’d) Step 3. Determine relationship type. Ask: • Each book is written by how many authors? Each author writes how many books? • Each book may be authored by zero (anonymous), one, or more than one author and each author may write zero, one, or more than one book. The relationship type is many-to-many or: AUTHOR • M writes N BOOK For PUBLISHER-publishes-BOOK, each publisher publishes zero, one, or more books and each book is published by exactly one publisher. The relationship type is one-to-many where BOOKS is on the many side and PUBLISHER is one the one side. PUBLISHER 1 publishe s M BOOK Library Database (cont’d) Step 4. Determine level of participation • Since each book does not have to be authored (anonymous) and since each author does not have to write a book (may make CD) the level of participation is optional for both sides of the relationship of AUTHOR-writes-BOOK combination AUTHOR M writes (0, N) • N BOOK (0, N) For the PUBLISHER-publishes-BOOK combination, the level of participation for PUBLISHER is optional (publishers do not necessarily have to publish a book, perhaps newsletters) and the level of participation for the BOOK side is mandatory (each book must have a publisher) PUBLISHER 1 (0, N) publishe s N BOOK (1,1) Library Database (cont’d) Step 5. Assign an identifier for each entity • AuthorID, ISBN, PublisherID Step 6. Draw completed E-R diagram AUTHOR M AuthorID, ... (0,N) writes ISBN, ... BOOK N (0,N) N (1,1) publishes 1 PUBLISHER PublisherID, ... (0,N) Library Database (cont’d) Step 6. Draw completed E-R diagram - resolve M:N relationships AUTHOR 1 M writes M 1 BOOK (0,N) (0,N) (1,1) (1,1) AuthorID, ... AuthorID,ISBN, ... ISBN, ... M (1,1) publishes 1 (0,N) PUBLISHER PublisherID, ... E-R Modeling: University Example A database is to be set up to record information about faculty, the courses they teach, and the students who take courses. Some courses are taught by teams of faculty members. • Step 1. Identify entity types FACULTY COURSE STUDENT • Step 2. Identify relationships FACULTY teaches COURSE STUDENT takes COURSE University Example Step 3. Determine relationship type. Ask: – – – – F1 F2 F3 F4 F5 F6 (cont’d) Each faculty member teaches how many courses? Each course is taught by how many faculty? Each student takes how many courses? Each course is taken by how many students? • Use occurrences diagram to visualize relationship between entities C1 C2 C3 C4 C5 C6 S1 S2 S3 S4 S5 S6 C1 C2 C3 C4 C5 C6 University Example (cont’d) Step 3. Determine Relationship type (cont’d) • For FACULTY-teaches-COURSE we are told each faculty member teaches zero, one, or two courses. We are told some courses are taught by zero, one, two, or three faculty. This is a many-to-many relationship. FACULTY M teaches N COURSE • For STUDENT-takes-COURSE each student enrols in one to six courses and each course is taken by zero or up to 30 students. This too is a many-to-many relationship. STUDENT M takes N COURSE University Example (cont’d) Step 4. Determine level of participation • FACULTY-teaches-COURSE - level of participation is optional, since sometimes Faculty do not have to teach (e.g., sabbatical); similarly, a course may not have anyone interested in teaching it FACULTY M teaches (0,2) N COURSE (0,3) • STUDENT-takes-COURSE - level of participation is mandatory since students must take at least one course; a course, however, may or may not have students taking it STUDENT M (1,6) takes N (0,30) COURSE University Example (cont’d) Step 5. Assign an identifier for each entity • FacultyID, CourseID, StudentID Step 6. Draw completed E-R diagram N M COURSE (0,3) CourseID, ... M taught by (0,2) N FACULTY FacultyID, ... taken by (0,30) STUDENT (1,6) StudentID, ... University Example (cont’d) You are now told that in addition to the relationships given, each student is assigned a faculty advisor who gives direction in choosing courses. Use occurrences diagram to visualize relationship between entities We are told each student is advised by exactly one faculty advisor. We can assume that each faculty member advises zero, one, or more students. This means the additional relationship is of type one-to-many or 1:M. The STUDENT is on the many side of the relationship and must be advised therefore, faculty is mandatory to student; FACULTY on the one side of the relationship may or may not have a student, therefore student is optional to faculty. 1 FACULTY (0,N) advises M STUDENT (1,1) University Example (cont’d) Step 6. Draw completed E-R diagram 1 M STUDENT M (1,1) (1,6) takes (1,1) M1 COURSE (1,1) (0,30) CourseID, ... 1 (0,3) M (1,1) StudentID, ... taught by M (1,1) 1 (0,2) (0,N) 1 FACULTY FacultyID, ... Evaluation of the E-R Model Using data models to conceptualize the design of a database saves time and money because a completed E-R diagram is the actual blueprint of the database. Its composition must reflect an organization's operations accurately if the database is to meet that organization's data requirements. The completed E-R diagram also lets the designer communicate more precisely with those who commissioned the database design. It’s easier to correct design flaws at the data modeling stage. Do not confuse entities and relationships with actual tables. The transformation or decomposition of E-R models will be discussed within the next few weeks. E-R modeling is an iterative process. Even when complete, ER models generally do not provide a complete picture (e.g., business rules cannot always be shown), therefore, much additional documentation is necessary. Week 2 - Exercises 1-2 Week After Next More entity-relationship modeling exercises Transformation of E-R Models Read Rob Chapter 4 Complete Adamski’s tutorial 3