DAMA, 2001 December. ORMvER 1 What’s Wrong With ER Modeling? Gordon C. Everest Carlson School of Management University of Minnesota Problems and Solutions ORMvER 2 OBJECTIVES FOR THIS PRESENTATION: • Show several PROBLEMS with ER modeling schemes, (actually, any “record-based” modeling scheme). • Identify the ROOT CAUSE of the problem To stop there would be irresponsible, so… • Show you a better way – a SOLUTION using Object Role Modeling (ORM) • NOT asking you to abandon what you have learned about data modeling and are doing in practice • BUT to defer thinking in terms of entity records, and to begin doing data modeling at a richer, more conceptual level Data Modeling DMOD 3 What is the Dominant Data Modeling Scheme today? What’s Wrong with ER Modeling? BEFORE WE CAN ANSWER THAT: Why Do Data Modeling? How do we do Data Modeling? Why do we need Normalization? Database Design DMOD 4 Objective: (WHAT we are trying to do) TO ACCURATELY AND COMPLETELY MODEL SOME PORTION OF THE REAL WORLD UNIVERSE OF DISCOURSE (UoD) OF INTEREST TO SOME ORGANIZATION OR COMMUNITY OF USERS. Logical Database Design Objective, Principles, Benefits DMOD 5 · OBJECTIVE of LOGICAL DATABASE DESIGN: TO ACCURATELY AND COMPLETELY MODEL SELECTED PORTIONS OF THE REAL WORLD OF INTEREST TO A COMMUNITY OF USERS. • USERS (COLLECTIVELY) WILL ALWAYS KNOW MORE ABOUT A DATA STRUCTURE THAN THE SYSTEM KNOWS, OR THAN COULD BE DEFINED TO THE SYSTEM. • WHAT IS NOT FORMALLY DEFINED TO THE SYSTEM, THE SYSTEM CANNOT MANAGE . . . THE USERS MUST! • THEREFORE, NEED TO CAPTURE RICH SEMANTICS WITH COMPREHENSIVE DATA MODELING and DEFINITION, INCLUDING INTEGRITY CONSTRAINTS AND OPERATIONS. FOR ==> GREATER QUALITY & RELIABILITY IN DATA ==> GREATER USER CONFIDENCE. ==> HIGHER USER / DEVELOPER EFFICIENCY Let the ‘system’ do it! Purpose of Data Modeling (WHY we do it) DMOD DUAL, CONFLICTING PURPOSES DRIVE THE PROCESS: 6 USE R • Facilitate Human Communication, Understanding, & Validation – capture and present meaning, the semantics of a model – direct representation of only essential model semantics PRESENTATION CHARACTERISTICS: – scoping and presenting subparts of a Model – unfolding presentation at different levels of abstraction or detail – visual prominence in proportion to semantic importance SECONDARY: • Basis for Implementation - defining & creating a Database – complete in all the necessary details – construction/generation able to be fully automated SCHEMA DATABASE Modeling DMOD 7 (Re).present.(ation) present Reality (mental models) MODELING PROCESS MODEL Re.present Knowledge in the head Knowledge in the world Knowledge externalized, formalized, shared. What drives or guides the process? The Modeling Process DMOD 8 METHODOLOGY: Steps/Tasks + Milestones + Deliverables Real World Universe of Discourse + MODELING SCHEME perception selection/filtering Context Constructs Composition Constraints MODELING PROCESS REPRESENTATIONAL FORMS: Narrative, Graphical Diagram, Formal Language Statements (the Syntax) MODEL A Data Modeling “Scheme” DMOD 9 DEFINES the: • Context • Constructs (ENTITIES, OBJECTS) • Collections, Compositions, Connections (RELATIONSHIPS) • Constraints, Characteristics WE LOOK FOR IN THE “REAL” WORLD UoD or Domain of Interest and USE IN BUILDING A DATA MODEL. Data Modeling Constructs DMOD 10 What to look for: Relative emphasis differentiates Data Modeling approaches ENTITY RELATIONSHIP (OBJECT) IDENTIFIER ATTRIBUTE characteristics [ FOREIGN KEY ] characteristics Student-Course Database - Table Diagram DMOD 11 Diagram of the Schema: COURSE Course# Title Description Credits INSTRUCTOR SSN LastName FirstName Address Phone Dept COURSE OFFERING Course# Year Term Section Building Room Days Time Start Control Enrollment Instructor SSN What if you move the arrow head to the other end of the arc? STUDENT Student ID Name Address Major GPA REGISTRATION Course ID Student ID Grade LEGEND: ENTITY NAME (upper case) Identifier (bold face) Attributes (not bold face) Foreign Key Identifier M:1 relationship Student-Course Database – Populated DMOD 12 Actual instances of data values: COURSE: Course# Title Credits ACC101 Intro Accounting 4 ENG101 English Composition 4 MIS101 Intro MIS 4 MIS103 Intro Database 4 MIS403 Advanced Database 2 … COURSE OFFERING: CRSO# Course# Year Term Sect Room 1004 MIS101 2000 Fall 001 1-142 1017 MIS101 2001 Spr 002 2-224 3001 MIS103 2000 Fall 001 2-207 … INSTRUCTOR: InstrID Name 33741 Allen, Lillian 85959 Boyd, Don 64578 Carlis, John 11248 Davis, Gordon 77004 Everest, Gordon 55432 Fine, Alan … Dept Eng ACC CSci IDS IDS IDS Secondary (Composite) Key STUDENT: StudentID 1111111 2222222 3333333 4444444 5555555 … InstrID 11248 55432 77004 Name Major GPA Able, Emma MIS 3.4 Bright, Sue MIS 3.9 Challenger, X ACC 2.7 Dummie, Noe ACC 3.2 Everest, Monty MIS 3.8 Enroll 48 60 27 REGISTRATION: CRSO# StudentID 1004 4444444 1017 3333333 3001 1111111 3001 2222222 3001 5555555 3001 7777777 … Grade B+ B A A A- Data Modeling – Schema Diagram DMOD 13 THINKING ABOUT ATTRIBUTES: Record-Based: ENTITY IDENTIFIER ATTRIBUTE ATTRIBUTE ... Essentials of ER Modeling / Diagramming DMOD 14 ENTITY1 1 RELATIONSHIP M ENTITY2 Adding Attributes, omitting the Diamond: identifier ENTITY1 ENTITY2 ============ IDentifier 1 --------------------Attribute 1.1 Attribute 1.2 Attribute 1.3 : ============ IDentifier 2 --------------------Attribute 2.1 Attribute 2.2 Attribute 2.3 ForeignID 1 : ENTITY1 ENTITY ATTRIBUTE ATTRIBUTE ... Attribute2 Attribute3 ENTITY2 IDENTIFIER Attribute1 Attribute ORMvER 15 What’s wrong with ER Modeling? ________ ER / Record-based Modeling DMOD 16 VALUE DOMAIN VALUE DOMAIN VALUE DOMAIN VALUE DOMAIN ... roles ID TABLE: X ATTRIBUTES . . . A B C D CLUSTERING of ATTRIBUTES into RECORDS/RELATIONS – NOT a necessary or desirable first step – gets us into trouble: if too much, must decompose to normalize Record-based Design ORMvER 17 WHAT SEMANTICS ARE PRESUMED BY THE FOLLOWING RECORD STRUCTURE? X A B C • What does it say about X ? • What does it say about A ? • What does it say about the relationship X – A ? • What does it say about the relationship A – B ? There are at least 14 distinct semantic statements you can make in answering these questions! • Do we know it is in Third Normal Form (3NF)? How? Record-based Design ORMvER 18 WHAT DOES IT SAY ABOUT X A X? B C Record-based Design ORMvER 19 WHAT DOES IT SAY ABOUT X A A? B C Record-based Design ORMvER 20 WHAT DOES IT SAY ABOUT THE RELATIONSHIP X A B C X–A ? Record-based Design ORMvER 21 REPRESENTING THE RELATIONSHIP X A A N B D ... C X–A ? Record-based Design ORMvER 22 WHAT DOES IT SAY ABOUT THE RELATIONSHIP X A B C A–B ? Record-based Design ORMvER 23 REPRESENTING COMPLEX RELATIONSHIPS AMONG X A ? A ... B C B A? ... X, A, & B . Separately consider the relationship between A and B. What if it is many-to-many? What if other information is functionally dependent on A–B ? Record-based Design - Compound Key ORMvER 24 WHAT IS PRESUMED BY THE FOLLOWING RECORD STRUCTURE? X Y A B C Major Data Modeling Schemes DMOD 25 Everest-DM-4p.121. (1) SINGLE FILE (E-A) - FLAT FILE “TABLE” - HIERARCHICAL - nested repeating groups e.g., COBOL (M) MULTIFILE (E-R → E-A-R) - NETWORK - hierarchical records - RELATIONAL (E-A-[R]) - flat records (O) NO FILE (O-R) (No Clustering of Data Items into Records) - NIAM/“Binary” Modeling - ORM (Object-Role Modeling - Halpin) RECORDBASED (Clustered Data Items) Data Modeling Schemes DMOD 26 CLASSIFIED by Degree of Clustering: • No clustering – NIAM/ORM - Nijssen, Halpin • Clustering to One Level => Atomic Data Values – – – – – – – Relational Modeling - Codd ER Modeling - Chen Extended ER (EER) - Teorey Information Engineering (IE) – Clive Finkelstein -> James Martin Oracle (Designer*2000) - Barker IDEF1X - Appleton, US Gov’t, ERwin (tool), Bruce (book) • Nested Objects – – – – – – Hierarchical data structure (single file; COBOL) CODASYL Network (ANSI NDL) Nested Relations Semantic Object Modeling (SOM) – Kroenke, Salsa (tool) Object Modeling (UML) – Rational Rose (tool) ANSI SQL:1999 Data Modeling Schemes – Clustered DMOD 27 HIERARCHIC special case NETWORK ER - - single file - nested repeating groups - implicit hierarchical relationships - multifile, hierarchical record - defined relationships => semantic/ OBJECT models Focus on E & R, hidden record structure Usually flat records [optionally with attributes] Defined relationships (general M:N) Usually restricted to binary relationships RELATIONAL - Multifile; flat records only - Relationships as foreign keys so no M:N relationships Taxonomy of “Clusterered” Data Structures DMOD 28 Single File Multiple Files Flat SINGLE FLAT FILE (“TABLE”) RELATIONAL (“TABLES”) Nested HIERARCHICAL FILE (CODASYL) NETWORK Intra-Record Structure Clustered Stages of Data Modeling DMOD 29 Start at the highest Conceptual Level! USE R Domain Knowledge CONCEPTUAL ER CLUSTERED ORM “LOGICAL” Attribs in Records RELATIONAL • Objects MultiValued, • Obj. ID’s PHYSICAL Nested - - - - - -> Flat (1NF) • Roles/Relships Ternaries - - - - - -> Binary only • Implementation • (Fnl. Dep) in/for a DBMS M:N - - - - - - - - - -> 1:Many only NO clustering • Denormalize Normalized (2,3,4) Primary Keys (for performance) => NO “attributes” Relationships - - -> Foreign Keys + triggers, stored w/attributes procedures Sub/SupTypes SCHEMA DATABASE Data Modeling - Representation Stages DMOD A SECOND CUT: 30 USE R NEW • Conceptual (ORMHALPIN/NIJSSEN SUMMFULTON UDMCDMTG) – only what the user knows or needs to know – functional dependencies fully represented – Elementary Facts - no clustering of “attributes” into “records” • Clustered (ERCHEN EERTEOREY SDMMcLEOD SOMKROENKE SQL:99ANSI UML) – identifiers (attributes or dependent relationships) – keep: M:N, ternary relationships, super/subtypes, attributed relationships, multi-valued items/rgroups • “Logical” (RELATIONALCODD SQLANSI ) – flat files/tables; – stored identifiers; – 3NF (decompose) – resolve: M:N, ternary, super/subtype relationships – foreign keys to represent relationships SCHEMA • Denormalize (Recluster) - for performance • Physical (IMPLEMENTATION in a DBMS) – triggers, stored procedures, user code to DATABASE represent and enforce semantics beyond the DBMS. Data Modeling Schemes - ER DMOD 31 • ENTITIES, that have ATTRIBUTES, and participate in RELATIONSHIPS. • Originated with Peter Chen, 1976, TODS (1:1) • Notation has evolved, many variations – Drop diamond; attributes inside entity box or suppressed. • • • • • No standard syntax notation (but similar semantics) Common: attributes clustered into entity records. Most popular today Weak entity Association entity Relationship naming: one name, direction unstated, thus ambiguous; need direction (>) or rule (eg. left to right). EMPLOYEE EmpNo M EmpName works in … 1 UnitNo DEPT Name … Data Modeling Schemes - Oracle DMOD 32 • • • • In Oracle Designer*2000 tool (R. Barker, A-W, 1990) A flavor of ER modeling ENTITY in rounded box; optionally ATTRIBUTES inside ATTRIBUTE flags: # - [part of] identifier * - mandatory o – optional • RELATIONSHIPS: - binary only - two names at end from which to be read - optional ---, mandatory —, many - identifying ———, fixed ——— EMPLOYEE EmpNo (#) EmpName (*) Address (o) works in employs DEPT Data Modeling Schemes - IE DMOD 33 • • • • • • Information Engineering (1970’s) Due to Clive Finkelstein, adapted by James Martin Used in several tools: IEF, IEW/ADW/Cool, ICES, … Widely used, many variations, no single standard ENTITIES: in boxes, optionally with ATTRIBUTES, in or out RELATIONSHIPS: - usually binary only - many ——— , at most one ——— - optional ——— (at the “other” end) - mandatory, at least one ——— EMPLOYEE DEPT Data Modeling Schemes – IDEF1X DMOD 34 • U.S. Air Force/Defense (1970’s), Appleton eXtensions • • • • • NIST (U.S. Govt) standard – 1993; revised in IDEF1X97; IEEE - 1998 Book by T. Bruce, 1992; Used in ERwin (now from CA), Visio, … Widely used in and for U.S. Govt work, some outside Some Relational restrictions: Foreign Keys, thus no M:N “Unnecessarily complex, confusing, and forgettable” - Halpin • ENTITY: independent , dependent • ATTRIBUTE flags: - Alternate Key - (AKi), Foreign Key - (FK) - optional (O) – mandatory is default • RELATIONSHIPS: - binary only, “child” ——— (may be arbitrary) - First Name always read toward the child - identifying —— , non identifying ----- “cardinality” on child: P - one or more, Z - zero or one, n - exactly n ----- Parent is optional (some allow many parents) EMPLOYEE EmpNo EmpName SS# (AK1) Address (O) UnitNo (FK) employs/ works in DEPT DeptNo DeptName : Forming a Relational Data Structure RELSQL 35 Some rules: • Define a TABLE or “Relation” for each Entity type – Types of Entities: base/reference, dependent (“weak”), association/intersection, event/transaction – Assumes mutually exclusive (non-overlapping) populations • SINGLE-VALUED ITEMS (“flat” tables) – If multivalued or nested repeating group of items, put into a separate table • IDENTIFIER for every table (entity “integrity”) • FOREIGN IDENTIFIERS to represent all relationships 1:M - stored in the child / dependent entity 1:1 - should probably merge into one table M:N - must introduce an association/intersection table • NORMALIZE to second and third normal form – important for good design – but not enforced by RDBMS... WHY? Functional Dependency in Relationships RELSQL 36 Basis for Database Normalization. A f (X) is functionally dependent on X A determines X A A is dependent on X, and the Relationship is exclusive on A, multiple on X. Clustered into a Record/table for entity of X: X A … There can only be one There can be multiple There can be different A for each X . Xs for a given A . As for the Xs . Database Normalization RELSQL 37 Start with ENTITIES, their IDENTIFIERS (unique keys) and their ATTRIBUTE FIELDS (facts about each entity). i.e., start with data items clustered into records/tables. PROBLEM: we may do it wrong; cluster too much; some items in the wrong place, which can lead to redundancy & update anomalies. Any Flat File is a Relation, but… not all Relations are “well-formed.” • NORMALIZATION is the test – a set of rules to perform internal validation of a data model • Record DECOMPOSITION is the remedy. – Removing attributes from the entity record, and placing them in a different, often a new entity record (1) First Normal Form: no multivalued items or rgroups. (2) Second Normal Form: no partial dependencies. (3) Third Normal Form: no transitive dependencies. “Every non-key data item must be single-valued, and dependent upon the key, the whole key, and nothing but the key… so help me Codd.” Anomalies RELSQL 38 Resulting from (clues to) poor database design: EMPLOYEE# o • • • • • EMPNAME SKILL PROFICIENCY … BOSSNAME DEPT# DEPTNAME DEPTNAME and BOSSNAME stored redundantly if EMPLOYEE moves to another DEPT#, DEPTNAME and BOSSNAME would also change, needing update. If a DEPTNAME (or BOSSNAME) for a DEPT changes, must update all occurrences, else inconsistency. To delete a DEPT you must also delete all its EMPLOYEEs (unless null foreign keys allowed!) If you delete the last EMPLOYEE in a DEPT, you also delete that DEPT (unless null keys allowed!…multiple?) No place to insert a DEPT# and its DEPTNAME, if there are no EMPLOYEEs there. Summary of all Normal Forms RELSQL 39 GIVEN: – a set of attributes, clustered into tables/records with identifiers – all functional dependencies on the attributes • No multi-valued, non-key attributes (1NF) • No partial dependencies on non-key attributes (2NF) • No transitive dependencies in non-key attributes (3NF) • No partial or transitive dependencies within any key (EKNF, BCNF), i.e., consider all candidate keys. • No multiple, independent multi-valued attributes in the same table (4NF) • No join dependencies, i.e., a relation can be reconstructed without loss of information by joining some of its projections (5NF). • No more than one table with the same key (“minimal”). • No transitive dependencies across tables (“optimal”). NOTE: number order is artificial, i.e., there is no necessary sequence to the normal forms. Normalization – Testing your Understanding RELSQL 40 Assuming that A is single valued with respect to X (i.e. 1NF). GIVEN: Could you have a violation of: (if not, why not?) X A X A X A X A 2NF? 3NF? 4NF? B 2NF? 3NF? 4NF? B 2NF? 3NF? 4NF? B What does this diagram mean? How does this differ from diagram above, if any? MUST DISTINGUISH THE PRIMARY KEY . Representing a M:N Relationship DMOD 41 Another Pattern: EMPLOYEE PROJECT • If you cannot store multiple Projects (or Project IDs) in an Employee record, or multiple Employees (or Employee IDs) in a Project record (as is the case in a Relational Database), then … you must introduce an “Intersection Entity” between them to represent the Many-to-Many Relationship. EMPLOYEE EMPL-ID PROJECT PROJ-ID • The Intersection Entity also provides the place to store additional attributes of the relationship e.g., Hours Worked, Rate of Pay, … What is the problem with this representation? Representing a Ternary Relationship DMOD 42 While we can develop a consistent notation for binary relationships, ternary relationships are a problem. EMPLOYEE SKILL PROFICIENCY • If one of the entities is single valued, is it really ternary? Or “attributed” binary? • What lends uniqueness to each instance of the relationship? • How to verbalize the relationship? Which order? • How to represent Multiplicity / Exclusivity ? • How to represent Dependency? Must have all 3? What’s Wrong with ER Modeling? ORMvER 43 I will show you still a more excellent way – PAUL, I Cor 12.31 N Record-based Design B ORMvER 44 WHAT DOES THIS “RECORD” REPRESENT? X X A X B X C A B C Design minimal records with at most one non-key domain. Now what do these “records” represent? Perhaps Codd was right in naming it a _________! Avoids spurious associations, e.g., A – B … Could there be any violations of normal forms? What about the representation of the entity What if A is related to other “entities”? X? Transform Record-based (ER) Design ORMvER 45 TO REALLY REPRESENT THE ENTITY DOMAINS X A B C A X A X B X C Object Role Model: X B C Data Modeling ORMvER 46 THINKING ABOUT ATTRIBUTES: Record-Based (ER): ENTITY IDENTIFIER ATTRIBUTE ATTRIBUTE Object-Role (ORM): ENTITY ... ENTITY ENTITY ENTITY ENTITY (id) ENTITY ENTITY ENTITIES have ATTRIBUTES / DESCRIPTORS by playing roles in relationships with other entities. Record-Based Modeling ORMvER 47 GIVEN TWO FACTS (conceptually): • one about the CITY a PERSON lives in • another about the CITY a PERSON works in ASSUME: • every person has to live and work in a city • each person can live and work in only one city at a time • not interested in anything more about persons or cities EXAMPLE: • Gordon Everest lives in Falcon Heights and * works in Minneapolis DIAGRAM A CONCEPTUAL DATA MODEL – to represent this information (a database to contain these facts) Record-Based Data Model ORMvER 48 for PERSON lives in / works in a CITY • What is the entity and what is the attribute? • Would it make any sense to say (to a novice layperson - a user): – CITY was an "attribute" of PERSON? • Doing more than is necessary at the conceptual level PERSON PersonID [key] LiveCity WorkCity • • • • • cannot have CITY and CITY as attributes of PERSON column/attribute name reflects " entity + role " CITY as an entity/object is lost (not its own table) what if there is a CITY where no one lives or works some add concept of a DOMAIN Object-Role Model ORMvER 49 for PERSON lives in / works in CITY lives in CITY PERSO N (id) (name) works in FORML language statements: • PERSON lives in CITY • Every PERSON lives in some CITY • Each PERSON lives in at most one CITY • ... for works in FACT Record-Based Modeling ORMvER 50 for an additional fact. • A PERSON makes sales calls in multiple CITIES DIAGRAM the extended conceptual data model • can you add an attribute "SalesCallCities" to PERSON? FLAT Record-Based Modeling is even worse: • create a new table SALESCALLS with a compound key – Is this a real entity in the conceptual view? EXTEND THE OBJECT-ROLE DATA MODEL Record-Based Data Modeling ORMvER 51 DISADVANTAGES: • no way to capture the conceptual view directly • must mentally map from conceptual view to the "logical" (record-based) view – by structural groupings of attributes and relationships • must choose unique, arbitrary names – for attributes in a record; for spurious new "entities" • • • • cannot reuse attributes in the same table must do your own normalization hides or ignores inter-attribute relationships creates (implies) spurious inter-attribute relationships Object-Role (ORM) Data Modeling ORMINTRO 52 THE ESSENTIAL DIFFERENCE: • Three main constructs ..rolled into.. Two main constructs Record-based modeling: ENTITY NIAM/ORM modeling: ? ? ? ? What to call it? ATTRIBUTE OBJECT ENTITY ENTRIBUTE! RELATIONSHIP Role in RELATIONSHIP Data Modeling Terminology ORMINTRO 53 O-R E-R ("conceptual") ("logical") COBOL/DBTG ("physical" implementation) ENTITY (TYPE) RECORD TYPE ATTRIBUTE DATA ITEM OBJECT (ELEMENT) FACT SENTENCE INSTANCE RECORD IDENTIFIER PREDICATE RELATIONSHIP CONSTRAINT CHARACTERISTICS RELATIONAL RELATION TABLE COLUMN FIELD ROW TUPLE KEY "SET" FOREIGN KEY CONSTRAINT Fact Sentence - Verbalize ORMODLG 54 • A Fact = a Predicate + Object(s) => Sentence • THINK: Objects playing Roles in a Relationship • Naming: object instances versus object types – e.g. “Ann” is an instance of “Person” • Arity - the number of object “holes” in the Predicate – UNARY: - “Ann smiles” - only 2 states: true/false, present/absent, yes/no - making the closed world assumption – BINARY: - “Ann likes to run” - most common - has an inverse- “Running is liked by Ann” - Inverse name is never the same (else symmetric, handled differently) – TERNARY: - “Ann married Bob in 1967” with types: - “PERSON married PERSON in YEAR” - verbalizing can be difficult with more than 2 (sequence problem) Symbolize: ORM Constructs ORMODLG 55 • OBJECT (ENTITY, CONCEPT) - NOUN … in an ellipse • PREDICATE (RELATIONSHIP) - verb = role name …in a box – unary, binary, ternary, +++ Binary Predicate: PREDICATE OBJECT1 role12 role21 OBJECT2 Elementary Binary Fact Sentence: works in employs PERSON Verbalization: “PERSON works in DEPARTMENT” “DEPARTMENT employs PERSON” DEPARTMENT Adding ORM Constraints ORMODLG 56 works in employs PERSON DEPARTMENT Verbalization: “PERSON works in DEPARTMENT” “DEPARTMENT employs PERSON” DEPENDENCY (MANDATORY): “PERSON must work in some DEPARTMENT” EXCLUSIVITY (UNIQUENESS): “PERSON works in at most one DEPARTMENT” Methodology Steps in OR Modeling ORMINTRO 57 • • • • • Familiarize with real world Universe of discourse Verbalize sentences of elementary facts Symbolize build the conceptual ORM model diagram Constrain the roles in predicates Validate the conceptual data model • Map into neutral, record-based, logical tables • Refine the table definitions • Generate physical database definition for target DBMS VisioModeler Architecture ORMINTRO 58 FORML fact sentences Population Quick Facts Tables FACT EDITOR CONCEPTUAL DATA MODEL DIAGRAMMER correct VERBALIZER VALIDATE BUILD DICTIONARY DICTIONARY "REPOSITORY" "LOGICAL" DATA MODEL (CHECK) refine (TABLES) BROWSER GENERATE PHYSICAL DATABASE STRUCTURE & DEFINITION for a target DBMS Levels of Abstraction in NIAM/ORM DMODPRE 59 REMOVING (generally in order of importance): 1. Lexical Object Types (LOTS); Value Object Types 2. “Terminal” Object types – equivalent to / become “attributes” IF: – play only functionally dependent roles (often only one role) i.e. One:Many relationships; (disjunctive) mandatory (implied) 3. Common Object Types - generic value domains / ref. modes 4. “Event” Object Types 5. Dependent (“weak”) Object Types - Subtypes, Objectified Facts 6. User-defined priority levels on Object Types 7. Constraints and Reference Modes 8. Predicates Sample, Simple ORM Data Model DMODPRE 60 earns EMPLOYEE (number) works in BOSS SALARY (dollars) paid to DEPT (number) employs supervises is headed by reports to superior to ac may spend up to of spending for LIMIT "EmployeeSkill!" { 1000 .. 9999 } possesses <=5 possessed by SKILL (code) has DESCRIPTION (name) is of { 1 .. 10 } with proficiency of assigned to A major criticism of NIAM / ORM, both by protagonists and proponents, is that it is too detailed, a bottom-up design, BUT… ER Diagrams usually omit the details of attributes and most constraints. So, present the model using top-down abstractions. RATING Remove "Terminal" (M:1) Objects ORM Abstractions DMODPRE 61 • Removing "Terminal" (M:1) Objects { 2000 .. 2999 } EMPLOYEE DEPT works in employs (number) BOSS (number) supervises is headed by reports to superior to ac "EmployeeSkill!" { 1000 .. 9999 } possesses possessed by <=5 SKILL (code) Remove Constraints and Reference Modes ORM Abstractions DMODPRE 62 • Removing Constraints and Reference Modes EMPLOYEE works in employs BOSS DEPT supervises is headed by reports to superior to possesses possessed by SKILL Remove Less Important Objects & Predicates – Subtypes, Objectified Predicates, Reflexive Relationships ORM Abstractions DMODPRE 63 • Removing Less Important Objects & Predicates – Subtypes, Objectified Predicates, Reflexive Relationships EMPLOYEE works in employs supervises is headed by SKILL Remove Predicates DEPT ORM Abstractions DMODPRE 64 • Removing Predicates EMPLOYEE SKILL DEPT ... Leaving BASE Entities! A Top-Level Abstract Conceptual Data Model an ER Diagram ? ! ! ! Language Design Criteria ORMQURY 65 See: Halpin, “Conceptual Queries”. • Semantic Strength, Expressiveness – Able to model all relevant details in the domain – The range of queries that can be expressed – The “100% Principle” • Semantic Clarity – Ease of Understanding and Use; intuitive – Unambiguous, i.e., only one possible meaning • Semantic Relevance – Only relevant information need be stated – Not dependent on artificial or spurious expressions • Semantic Stability, Independence – How well the model/query retains its original intent in the face of changes to the underlying application Conceptual Query Language ORMQURY 66 See: Halpin, “Conceptual Queries”. • ConQuer – Based on ORM – Need not be familiar with ORM or its notation “user can construct a query without any prior knowledge of the schema” but… – In the form of a textual outline - Indentation is significant – Implemented in Visio ActiveQuery - Object pick list – drag to the query window - Roles pick list – drag to the query window – Projection – items to display marked with a tick () – Mapping to SQL Sample ConQuer Query (1) ORMQURY 67 See: Halpin, “Conceptual Queries”. “List Employees who live in the City that is the Location of Branch 52” Employee (number) Branch (number) lives in / City is located in / is location of CityName / has U State Employee [number] / is in (code) +– lives in City +– is location of Branch [number =] 52 NOTE: City acts as a Join object type (the common “attribute”), i.e. Employee and Branch are joined through City. Semantic clarity (+), semantic relevance (+), semantic stability (+). SQL for Sample ConQuer Query (1) ORMQURY 68 See: Halpin, “Conceptual Queries”. “List Employees who live in the City that is the Location of Branch 52” Employee lives in / (number) Branch (number) is located in / is location of CityName In SQL: (Where are the tables?) City / has U State (code) / is in SELECT EmployeeNumber FROM Employee, Branch WHERE Employee.CityName = Branch.CityName and Employee.StateCode = Branch.StateCode and Branch.BranchNumber = 52 Could you do this in Access using the Query Form? Semantic clarity (-), semantic relevance (-), semantic stability (-) Suppose an Employee could live in more than one City??? Suppose we now wish to record the Population of Cities??? Problems with ER Modeling - Summary ORMvER 69 • Too much clustering; attributes in the wrong place • Ignores (presumes) intra-record structure (that is, inter-attribute relationships) • Human modeler is responsible for normalization remedy is always record decomposition • Attribute migration… to become an entity - modeler must distinguish attributes and entities • Naming columns = domain + role, loses domain objects • Modeling dilemma: – Complete representation of an entity object - more clustering – Full normalization (1NF) – decomposition, less clustering • Indirect representation of M:N relationships – Introduces artificial “new” entities • Difficulty representing Ternary relationships • Stability of the query language (SQL) At the Root, ORMvER 70 What’s wrong with ER Modeling? CLUSTERING Gordon C. Everest Carlson School of Management University of Minnesota Why NIAM/OR Modeling? ORMINTRO 71 • roots in both LOGIC & LINGUISTICS • based on one modeling construct: the fact sentence • more expressive, understandable - diagrams & verbalization • diagrams can be populated with actual data samples • abstraction levels equivalent to E-R modeling • more, richer semantics (than E-R, EER, IDEF1X) • capture and represent all functional dependencies • avoids normalization problems with record-based modeling • better meets criteria for good data modeling • organizations that switched wouldn’t go back to E-R • direction of Standards (SUMM, UDM, ...) • now supported with a viable PC-based CASE tool Resources on ORM ORMvER 72 BOOK: • Terry Halpin (now from Microsoft), Information Modeling and Relational Databases: From Conceptual Analysis to Logical Design, Morgan Kaufmann Publishers, San Francisco, 2001, 763 pages. WEB SITE for my course: • http://webfoot.csom.umn.edu/faculty/everest/idsx431 – with ORM intro and further reading – InfoModeler software download – Usage Notes SPRING CLASSES: • IDSc 6431 (for MBAs) • IDSc 4431 (for CSOM Undergrads) • IDSc 4131 (for CCE and others) TRAINING and CONSULTING: • InConcept, Inc., Lake Elmo, MN www.inconcept.com