Bringing It All Together in a Commercial RDBM Package • Good relational database software packages allows the user to: – Record and display the design of every table, including field names, descriptions, types, ranges, key fields, foreign keys and indexes in a data dictionary; – Record and display relationships between tables; – Support at least the fundamental relational functions SELECT, PROJECT and JOIN; Packages such as Microsoft Access, FoxPro and Oracle support this functionality Tuesday, September 14, 1999 90-728 MIS Lecture Notes 1 Connectivity Clarification • One-to-one (1-1) relationship: – a single occurrence of one entity is associated with a single occurrence of another entity For example: 1 1 PASSPORT PERSON • One-to-many (1-M) relationship: – a single occurrence of one entity is associated with one or more occurrences of another entity For example: 1 M CAR INSPECTION • Many-to-many (M-M) relationship: – one or more occurrences of one entity is associated with one or more occurrences of another entity For example: CUSTOMER Tuesday, September 14, 1999 M M PRODUCT 90-728 MIS Lecture Notes 2 Complex Databases and the EntityRelationship Model Last lecture: – – – – Presented table designs for transactions Introduced codes for data consistency Defined basic data operations Showed 1-to-many entity relationship foreign key links between tables – Showed table joins PK/FK links between tables Today: – More abstract view of database design: entity-relationship diagrams as representation of business rules – Service delivery life cycles: tracking events as data object passes through information system – Implementing a database using the E-R model Tuesday, September 14, 1999 90-728 MIS Lecture Notes 3 Data Models and Levels of Abstraction A data model uses E-R diagrams to represent important policies and procedures of an organization Data models can be used by senior managers and by programmer/analysts. There are three kinds of data models in I/T design: – Conceptual models, in which entities and relationships are represented without reference to hardware or software platforms: – Internal models, in which conceptual E-R diagram is modified for specific database software platform used; – External models, in which E-R diagram is divided into functional modules with explicit business constraints and common entities; – Physical models, which adapt abstract models to hardware- and software-specific design considerations Tuesday, September 14, 1999 90-728 MIS Lecture Notes 4 Data Models for RDBMSs Relational database models (RDBMs) shield physical and software-specific details from the end-user. Thus, we will be concerned with the conceptual model of data storage. Data modeling caveats: – Many I/T professionals work at the physical level almost exclusively – Many I/T professionals focus on “data flows” rather than “entities” – Much real-world database design is done without explicit abstract data models Tuesday, September 14, 1999 90-728 MIS Lecture Notes 5 Entities and Attributes An entity is a fundamental data element. It corresponds to a table in the relational model. An attribute is a feature or partial description of an entity. It corresponds to a field in the relational model. – Composite attribute: can be divided to yield further attributes – Simple attribute: cannot be divided into other attributes – Single-valued attributes: attributes which can take on a single value from their domains. – Multi-valued attributes: attributes which can take on two or more values from their domains. – Derived attributes: attributes whose values are calculated via an algorithm and which do not have to be stored in the database. Tuesday, September 14, 1999 90-728 MIS Lecture Notes 6 Rules for Attribute Representation • Attributes in E-R diagrams: Include attributes in E-R diagrams only in preliminary stages. Detailed descriptions of attributes are stored in the data dictionary. STUDENT Table COURSE COURSE COURSE COURSE COURSE STUDENT Attribute CrsNbr CName Credit MaxEnrl FID SID Last Name Definition Course Number Course Name Course Credits Maximum Enrollment Faculty Number Student Number Data Type Character Character Numeric Numeric Numeric Numeric Key PK FK FK PK • Composite versus Simple Attributes: Use simple attributes whenever necessary to minimize chances of data key error or data extraction complications Tuesday, September 14, 1999 90-728 MIS Lecture Notes 7 Rules for Attribute Representation (cont’d) • Single-Valued versus Multi-Valued Attributes: Multi-valued attributes cannot be represented directly within the relational model. Instead, either: – Define new single-valued attributes, or – Define a new entity set 1 Fender_Color Car Hood_Color Trim_Color Car M Color or Car_ID Part_Color Car_Part What are the pros and cons of these approaches? Tuesday, September 14, 1999 90-728 MIS Lecture Notes 8 Rules for Attribute Representation (cont’d) • Derived Attributes: For transaction processing, use derived attributes as opposed to stored attributes whenever possible. – Minimizes number of columns in table – Attributes may have to be re-calculated for reports or queries anyway For data warehouses, derived attributes may be stored as explicit fields since focus is on data aggregation rather than view generation. Tuesday, September 14, 1999 90-728 MIS Lecture Notes 9 Entity Relationships Relationships describe the type of association between entities: – Business rule representation: text description defines a business rule; – Number of associations between related entities: n-ary relationships; – Connectivity: number of instances of one entity that are uniquely associated with one or more instances of another entity; – Cardinality: the number of entity occurences associated with a specific entity; – Existence Dependency: an entity may or may not exist if a related entity does not exist; – Relationship Participation: an entity can exist independent of another entity (optional) or must be associated with an entity (mandatory) – Weak Entities: an entity is existence-dependent and has a primary key which is derived from that of the associated entity Tuesday, September 14, 1999 90-728 MIS Lecture Notes 10 Unary (Recursive) Relationships Entities can be related to themselves in a variety of ways: – A course can be a prerequisite for another course; – A part can be assembled from one or more other parts; – An employee can be supervised by another employee; Representation of unary relationships depends on the connectivity associated with the recursion : If a course has at most one prerequisite, then add a prerequisite to the COURSE table. If a course can have many prerequisites, use a linking table. CrsNbr 101 102 103 309 311 Tuesday, September 14, 1999 CrsNbr 101 102 103 309 311 Cname Algebra 1 Algebra 2 Trig Pascal C Cname Algebra 1 Algebra 2 Trig Pascal C Prerequisite 1 1 COURSE 101 102 101 309 CrsNbr 103 103 311 311 is a prereq Prerequisite 101 102 102 309 90-728 MIS Lecture Notes M N COURSE CCLink 11 Cardinalities and Business Rules Cardinality determines how many times a row related in one table will appear in another table. For example, a business rule associated with student preferences for school transfers may specify that: – a student can list at most nine schools to which he/she may wish to be considered for acceptance next year, and – a student must list at least one school. 1 M PREFERENCES may rank STUDENT (1, 9) (1, 1) A business rule requiring that a student list his/her current school as one of the preferences may be implemented only at the application software level Tuesday, September 14, 1999 90-728 MIS Lecture Notes 12 Weak Entity Sets Weak entity sets are useful when business rules do not permit keys that are unique to various entities. – For example, a tips hotline for reporting cars that may be stolen could be implemented as: CALLER 1 M CALL 1 M CVLINK M 1 VEHICLE CALLER (SSN, First Name, Last Name, Phone Number, . . .) VEHICLE (VIN, Plate Number, State, Make, Model, Year, Color, . . .) CALL (Call#, Date, Time, Address, SSN@, VIN@, . . .) CVLINK (Call#@, VIN@) However, callers may be unwilling to identify themselves, and there may be only sketchy information on the vehicles. CALLER and VEHICLE become weak entities, each unable to have unique keys. Tuesday, September 14, 1999 90-728 MIS Lecture Notes 13 Weak Entity Sets (cont’d) CALLER and VEHICLE share the (unique) key CALL. Thus, if the same car is reported by three different callers, the car appears in VEHICLE three times. Also, every time a person makes a call, the caller is included in the database again: 1 CALLER 1 1 M CALL VEHICLE CALLER (Call#, SSN, First Name, Last Name, Phone Number, . . .) VEHICLE (Call#, Vehicle#, VIN, Plate Number, State, Make, Model, Year, Color, . . .) CALL (Call#, Date, Time, Address, SSN@, VIN@, . . .) This is another example of the identification and application of business rules Tuesday, September 14, 1999 90-728 MIS Lecture Notes 14 Generating “Views” of Data Users often want to see data from multiple tables combined in an intuitive way. To create views of data, perform multi-table joins: – (i) Choose an entity set (table) that has not yet been processed. Call the chosen table the "row driver" of the view. – (ii) Include all tables into the view that fall along relationship paths starting from the row driver that have a cardinality of 1 pointing away from the row driver. – (iii) Return to step (i) until all desired tables have been processed. Example: How can we generate views of data associated with sales events? PRODUCT 1 1 M SPLINK Tuesday, September 14, 1999 (appears in) M SALE M (conducts) (requests) CUSTOMER 1 SALESPERSON 90-728 MIS Lecture Notes 15 Generating Views of Data (cont’d) Trivial views: generated by a single table Non-trivial views: generated by multiple tables in one-to-one or an one-to-many relationships. To do this, have each component entity added to the view one by one until all desired tables are added. For example: – SALESPERSON, CUSTOMER, and PRODUCT have only the trivial views of themselves – SALE has the view: SALE + SALESPERSON + CUSTOMER – SPLINK has the view consisting of every table: SPLINK + SALE + CUSTOMER + SALESPERSON In Access, views are implemented as queries consisting of a series of joins. Tuesday, September 14, 1999 90-728 MIS Lecture Notes 16 Multiple Linking Tables Business rules may require that a database may have more than one linking table. – For example, a hospital operations database may have the following rules: • An operation has a single patient but many doctors, each with a different role • A patient may have more than one procedure performed in a single operation • A patient may have several post-operative drugs. +--------+ +-----------+ ¦ ROLE ¦ ¦ PATIENT ¦ +--------+ +-----------+ ¦1 ¦1 ¦M ¦M +--------+M 1+-----------+1 M+---------+ ¦ ODLINK +---------¦ OPERATION +--------¦OPDLINK ¦ +--------+ +-----------+ +---------+ ¦M ¦1 ¦M ¦1 ¦M ¦1 +--------+ +-----------+ +---------+ ¦ DOCTOR ¦ ¦ OPLINK ¦ ¦POST-OP ¦ +--------+ +-----------+ ¦DRUG ¦ ¦M +---------+ ¦1 +------------+ ¦ PROCEDURE ¦ +------------+ Tuesday, September 14, 1999 90-728 MIS Lecture Notes 17 Multiple Linking Tables (cont’d) The E-R diagram could be implemented as: ROLE (Role Code) PATIENT (Patient#, . . .) DOCTOR (Doctor#, . . .) PROCEDURE (Procedure Code, Procedure Name) POST-OP DRUG (Drug Code, Drug Name) OPERATION (Operation#, Patient#@, Date, Start Time, . . .) ODLINK (Operation#@, Doctor#@, Role Code@) OPLINK (Operation#@, Procedure Code@) OPDLINK (Operation#@, Drug Code@) Data views of interest could include: – Number of operations of various types performed by each doctor: ODLINK + DOCTOR + OPERATION – Drugs used on patients during recent operations: OPDLINK + POST-OP DRUG + OPERATION + PATIENT – All procedures performed on patients: OPLINK + PROCEDURE + OPERATION + PATIENT Tuesday, September 14, 1999 90-728 MIS Lecture Notes 18 Service Delivery Life Cycles It may be useful to track the operations or steps associated with a particular event as it winds its way through a system: – Patient intake and treatment in a hospital – Handgun tracing to detect firearms used in crimes – Charitable pledge tracking Example: Lost and found department of an agency – – – Policy is that the same day an item is found, a notice needs to be posted describing the item and stating that it has been found. After 30 days, if no one has claimed an item, a second notice is posted. Finally, 30 days after the second notice, if no one has claimed an item, the item is disposed. Problem: Find a way to track each found item through its life cycle stages, until it comes to a final disposition Solution: Use a decision tree with codes for each branch. Tuesday, September 14, 1999 90-728 MIS Lecture Notes 19 Service Delivery Life Cycles (cont’d) Owner Found After First Notice Item Found OwnerFound After Second Notice First Notice Sent Second Notice Sent For decision support purposes, it may be useful to associate probabilities with each potential event (tree branch) Item Disposed Tuesday, September 14, 1999 90-728 MIS Lecture Notes 20 Service Delivery Life Cycles (cont’d) • What are the branching frequencies? • What are the branching probabilities? • What are the average durations until owners are found? Tuesday, September 14, 1999 90-728 MIS Lecture Notes 21 Converting an E-R Diagram into a Database Structure • Define tables and primary keys • Define attributes based on cardinality restrictions and primary key definitions • Define indexes for certain (combinations of ) attributes • Define table relationships • Allow cascade updates/cascade deletes if business rules allow • Build data dictionary: – – – – Attribute name, description and data type Attribute cardinality Attribute domain Example data elements Computer-aided software engineering (CASE) tools automate many of these steps Tuesday, September 14, 1999 90-728 MIS Lecture Notes 22