Enhanced E-R Model and Business Rules CS263 Lecture 4 Introduction The basic ER model was introduced in the mid 1970s Since then business relationships and business data have become more complex The term Enhanced Entity Relationship (EER) model refers to the extension of the original ER model Supertype/Subtype relationships Most important new modelling construct in the EER model Allows us to model a general entity type (the supertype) and then subdivide it into several specialised entity types (called subtypes) Each subtype inherits from its supertype and may have special attributes of its own Representing supertypes and subtypes • • • The supertype is connected with a line to a circle, which in turn is connected by a line to each subtype that has been defined (see Fig.) The ‘U’ shaped symbol on each line connecting a subtype to the circle indicates that the subtype is a subset of the supertype, and also indicates the direction of the relationship Attributes shared by all the entities are associated with the supertype, whilst attributes that are unique to a particular subtype are associated with that subtype Basic notation for supertype/subtype relationships An example: the EMPLOYEE supertype Suppose that an organisation has 3 basic types of employees: Hourly: Employee_Number, Employee_Name, Address, Date_Hired, Hourly_Rate Salaried: Employee_Number, Employee_Name, Address, Date_Hired, Annual_Salary, Stock_Option Contract consultants: Employee_Number, Employee_Name, Address, Date_Hired, Contract_Number, Billing_Rate Employee supertype Many attributes the same across 3 types, so we could define a supertype called EMPLOYEE, with subtypes for HOURLY_EMPLOYEE, SALARIED_EMPLOYEE and CONSULTANT (see Fig) Employee supertype with three subtypes All employee subtypes will have emp_no., name, address, and date-hired Each employee subtype will also have its own attributes When to use supertype/subtype relations Should consider using subtypes when either (or both) of the following conditions are present: 1. There are attributes that apply to some (but not all) of the instances of an entity type 2. The instances of a subtype participate in a relationship unique to that subtype Both are true in the following Fig., where PATIENT has two subtypes: OUTPATIENT and RESIDENT PATIENT (the primary key is PATIENT_ID) All patients have an Admit_Date and a Patient_Name Patient example Every patient is cared for by a RESPONSIBLE_PHYSICIAN who develops a treatment plan for the patient Each subtype also has unique attributes. Outpatients have a Checkback_Date, whilst residents have a Date_Discharged and a unique relationship that assigns each patient to a bed (this is a mandatory relationship, and each bed may or may not be assigned to a patient) Supertype/subtype relationships of patients Both outpatients and resident patients are cared for by a responsible physician Only resident patients are assigned to a bed Generalization and specialization Generalization: = the process of defining a more general entity type from a set of more specialized entity types. BOTTOM-UP Specialization = the process of defining one or more subtypes of the supertype, and forming supertype/subtype relationships. TOP-DOWN Following Figs. Shows Generalisation in a situation where we have 3 different entity types: CAR, TRUCK and MOTORCYCLE In second Fig. The more general entity type VEHICLE is shown Generalisation MOTORCYCLE is not shown as it does not satisfy conditions for a subtype discussed earlier – the only attributes of MOTORCYCLE are those that are common to all vehicles, there are no attributes specific to motorcycles. Further, MOTORCYCLE does not have a relationship to another entity type Example of generalization Three entity types: CAR, TRUCK, and MOTORCYCLE All these types of vehicles have common attributes Generalization to VEHICLE supertype So we put the shared attributes in a supertype Note: no subtype for motorcycle, since it has no unique attributes Specialisation In an example of specialisation, an entity type PART has identifier Part_No and other attributes Description, Unit_price, Location, Qty_On_Hand, Routing_Number and Supplier (a multivalued attribute as there may be more than one supplier) Some parts are internally Manufactured Parts whilst others are externally Purchased Parts (some parts are obtained from both sources – when the choice depends on factors such as manufacturing capacity, unit price of the parts etc.) Specialisation Some attributes apply to all parts regardless of source, others such as Routing_Number depend on the source as they apply only to Manufactured Parts. This suggests that PART should be specialised by defining the subtypes MANUFACTURED_PART and PURCHASED_PART A new relationship ‘Supplies’ has been created between PURCHASED_PART and SUPPLIER that allows users to more easily associate purchased parts with their suppliers The attribute Unit_Price is now associated with the relationship ‘Supplies’ so that the price may vary between suppliers Example of specialization Entity type PART Applies only to purchased parts Only applies to manufactured parts Specialization to MANUFACTURED PART and PURCHASED PART Created 2 subtypes Note: multivalued attribute was replaced by a relationship to another entity Completeness constraints Whether an instance of a supertype must also be a member of at least one subtype. Has two possible rules: 1:Total Specialization Rule: Yes (double line) In following Fig. A PATIENT must either be an OUTPATIENT or a RESIDENT PATIENT. Total specialisation is indicated by the double line extending from the PATIENT identity type to the circle Examples of completeness constraints Total specialization rule A patient must be either an outpatient or a resident patient Completeness constraints • Partial Specialization Rule: No (single line) An entity instance of the supertype is allowed not to belong to any subtype. • In the following Fig. If a VEHICLE is a car it will appear as an instance of CAR, and if a truck as an instance of TRUCK. • However, if the vehicle is a motorcycle it cannot appear as an instance of any subtype. This example of partial specialisation is specified by the single line from the VEHICLE supertype to the circle Partial specialization rule A vehicle could be a car, a truck, or neither Disjointness constraint Whether an instance of a supertype may simultaneously be a member of two (or more) subtypes. Has two possible rules: Disjoint Rule: An instance of the supertype can be only ONE of the subtypes. Following Fig. shows that at any one time a PATIENT must be either an outpatient or a resident patient but cannot be both – specified by the letter ‘d’ The subclass of a patient may change over time, but at any given time a patient is of only one type Examples of disjointness constraints Disjoint rule A patient can either be outpatient or resident, but not both Disjointness constraint • Overlap Rule: An instance of the supertype can simultaneously be a member of more than one of the subtypes. Some PARTs are both Manufactured and Purchased. An instance of PART is a particular Part Number (i.e. type of part) rather than the individual part itself (Part_No). Considering Part Number 4000, at a given time there may be 250 of this part to hand, of which 100 are manufactured and 150 are purchased. The overlap is specified by placing an ‘o’ in the circle • Double line suggests any part must be either a purchased part or a manufactured part, or it may simultaneously be both of these Overlap rule A part may be both purchased and manufactured Subtype discriminators • Attribute of the supertype whose values determine the target subtype(s) • Disjoint subtypes: a simple attribute with alternative values to indicate the possible subtypes. In the following Fig. A new attribute ‘Employee_Type’ has been added to the supertype to serve as a subtype discriminator. 3 values ‘H’ = Hourly, ‘S’ = Salaried, ‘C’ = Consultant. This is assigned the correct value when a new employee is added. Subtype discriminators The expression “Employee_Type = “ (the LHS of a condition statement) is placed next to the line leading from the supertype to the circle, with the value of the attribute that selects the appropriate subtype placed adjacent to the line leading to that subtype Subtype discriminator (disjoint rule) A simple attribute with different possible values indicating the subtype Subtype discriminators • Overlapping – a composite attribute whose subparts pertain to different subtypes. Each subpart contains a boolean value to indicate whether or not the instance belongs to the associated subtype. • In the following Fig. a new attribute Part_Type has been added to PART. Part_Type is a composite attribute with components ‘Manufactured?’ and ‘Purchased?’ Each of these attributes is a boolean variable, and can be combined as: Manufactures only = YN, Purchased only = NY, Purchased and manufactured = YY Subtype discriminator (overlap rule) A composite attribute with sub-attributes indicating “yes” or “no” to determine whether it is of each subtype Defining supertype/subtype hierarchies It is possible for any of the subtypes in these examples to have other subtypes defined on it A supertype/subtype hierarchy is a hierarchical arrangement of supertypes and subtypes, where each subtype has only one supertype In modelling the Human resources in a University, the most general entity type would be PERSON (sometimes called the root) Relevant attributes are SSN (Social Security Number – Identifier), Name, Address, Gender and Date_Of_Birth Defining supertype/subtype hierarchies Next we define all major subtypes of the root, here there are 3, EMPLOYEE, STUDENT and ALUMNUS (already graduated) A person may belong to more than one subtype (such as ALUMNUS and EMPLOYEE) so the overlap rule is used. Overlap allows for any overlap (3-way) so if certain combinations are not allowed a more refined supertype/subtype hierarchy would have to be developed to eliminate these Defining supertype/subtype hierarchies The next step is to evaluate whether any of the subtypes already defined qualify for further specialisation. Here employee has two subtypes, FACULTY and STAFF Because there may be types of employee other than Faculty and Staff, the partial specialisation rule is indicated. However, such an employee cannot be both Faculty and Staff at the same time, so the disjoint rule is indicated in the circle Defining supertype/subtype hierarchies Two subtypes are defined for student: GRADUATE_STUDENT and UNDERGRADUATE STUDENT Notice that total specialisation and the disjoint rule are specified Example of supertype/subtype hierarchy Entity clustering EER diagrams are difficult to read when there are too many entities and relationships Solution: group entities and relationships into entity clusters Entity cluster: set of one or more entity types and associated relationships grouped into a single abstract entity type Because an entity cluster behaves like an entity type, entity clusters and entity types can be further grouped to form a higher-level entity cluster Entity clustering Entity-clustering is a hierarchical decomposition of a macro-level view of the data model into finer and finer views, eventually resulting in the full, detailed data model. The first of the following Figs. Shows the completed data model with dashed lines drawn around possible entity clusters, and the second shows the final result of transforming this into an EER diagram of only entity clusters and relationships Entity clustering Entity clusters are formed: 1. By abstracting a supertype and its subtypes (see CUSTOMER) 2. By combining directly related entity types and their relationships (SELLING_UNIT, ITEM, MATERIAL, MANUFACTURING) An entity cluster can also be formed by combining a strong entity and its associated weak entity types (not shown here) Possible entity clusters for Pine Valley Furniture (PVF) Related groups of entities could become clusters EER diagram of PVF entity clusters More readable, isn’t it? Business rules Statements that define or constrain some aspect of the business. We have already seen some (rules about relationships between entity types [cardinality], supertype/subtype relationships and definitions about attributes and entities). There are 3 main types of business rules: 1. Derivation – rule derived from other knowledge 2. Structural assertion – rule expressing static structure of the organisation 3. Action assertion – rule expressing constraints/control of organizational actions Following Fig. Is an EER diagram to describe business rules EER depiction of business rules classification Business rules The ER model in the following Fig. Contains 4 entity types: FACULTY, COURSE, SECTION and STUDENT. SECTION is a weak entity type, because it cannot exist without the COURSE entity type The identifying relationship for SECTION is ‘Is_Scheduled’ and the partial identifier is Section_ID (a composite attribute) Only selected attributes are shown for the various entity types to simplify the diagram Stating a structural assertion Four examples are: 1. A course is a module of instruction in a particular subject area. This definition of the term course associates the two terms (module and subject area) 2. Student_Name is an attribute of student 3. A student may register for many sections, and a section may be registered for by many students – this fact states the participation of entity types in a relationship (Is_Registered) Stating a structural assertion 4. A faculty member is an employee of the University. Although not shown in the following Fig., this fact designates a supertype/subtype relationship. The type of facts above are called base facts, they are fundamental and cannot be derived from other terms or facts Derived facts A derived fact is a fact that is derived from business rules using an algorithm or inference A derived attribute is an example of a derived fact Two examples of derived facts follow: 1. Student_GPA (Grade Point Average) = Quality_Points/Total_Hours_Taken Where Quality_Points = sum [for all courses attempted] (Credit_Hours*Numerical_Grade) Student_GPA could be shown in the Fig. As a dashed oval connected to STUDENT Derived facts 2. A student is taught by the faculty assigned to the sections for which the student is registered – this fact can be derived by following the Is_Registered relationship from STUDENT or SECTION and then the Is_Assigned relationship from SECTION to FACULTY Stating an action assertion Action assertions deal with the dynamic aspects of the organisation and impose “must/must not” and “should/should not” constraints on handling data An action assertion is the property of some business rule (called the anchor object) for a data handling action (such as create, update, delete etc.) and it states how the other business rules (called corresponding objects) act on the anchor object Stating an action assertion Examples of action assertions are: A course (anchor object) must have a course name (corresponding object). Here the action is updating the course name property of a course A student (anchor object) must have a value of 2.0 or greater for their Student_GPA (corresponding object) to graduate (action) A student cannot register for (the anchor object is the Is_Registered relationship) a section of the course for which there is no qualified faculty (the corresponding object is the Is_Qualified relationship) Types of action assertions There are 3 ways of classifying action assertions: 1: Based on the type of result from the assertion – 3 subtypes: I. Condition, which states that if something is true then another business rule will apply - ‘if a course has a qualified faculty, then the students can register for a section of that course’ II. Integrity constraint, which states something that must always be true – ‘the date a faculty becomes qualified to teach a course cannot be after the semester in which the faculty is assigned to teach a section of that course’ Types of action assertions III. Authorisation, which states a privilege – ‘only department chairs can qualify a faculty to teach a course’ Types of action assertions 2. Based on the form of the assertion. There are 3 types of assertion: I. Enabler, which if true permits or leads to the existence of the corresponding object – ‘a faculty can be created once the faculty is qualified to teach at least one course’ II. Timer, which enables/disables or creates/deletes an action – ‘when a student has a GPA above 2.0 and a total credit hours above 125, then the student may graduate’ Here graduating does not occur because of the timer, but the timer enables the action to occur Types of action assertions III. Executive, which causes the execution of one or more actions. Can be thought of as a trigger for some action ‘when a student has a GPA below 2.0, then the student goes on academic probation’ Types of action assertions 3. Based on the rigor of the assertion. Looking at action assertions in this way yields 2 types of assertions: I. Controlling, which state that something must of must not be or happen (above examples fall into this category) II. Influencing, which are guidelines or items of interest for which a notification must occur – ‘when the number of students exceeds 90% of the capacity of that section, notify the responsible departmental chair’ – in this situation nothing is controlled (students may continue to register) but management wants to know that a particular condition has occurred Data model segment for class scheduling Representing and enforcing business rules The modern approach is to declare action assertions at a conceptual level without specifying how the rule will be implemented Can do this graphically (EER notation) or using structured grammar Business rule 1 ‘For a faculty member to be assigned to teach a section of a course, they must be qualified to teach the course for which that section is scheduled’ The anchor object is the relationship Is_Assigned, and we are constraining the assignment of the faculty member to the section – the following Fig. Shows a dashed line from Is_Assigned to the action assertion symbol So in this case there are 2 corresponding objects, represented by the dashed lines from the action assertion symbol to each of the two relationships Business Rule 1: For a faculty member to be assigned to teach a section of a course, the faculty member must be qualified to teach the course for which that section is scheduled Corresponding object In this case, the action assertion Action assertion is a Restriction Anchor object Corresponding object Business rule 2 ‘For a faculty member to be assigned to teach a section of the course, the faculty member must not be assigned to teach a total of more than 3 course sections’ This rule imposes a limitation on the total number of sections a faculty member may teach at a given time. Since the rule involves a total, it requires a modification of the previous notation. As shown in the following Fig., the anchor object is again the relationship Is_Assigned Business rule 2 However, in this case the corresponding object is also Is_Assigned – it is a count of the total sections assigned to the faculty member The letters ‘LIM’ in the action assertion symbol stand for ‘limit’ The arrow leaving this symbol then points to a circle with the letter ‘U’ (upper). The second circle then contains the number 3, which is the upper limit. Business rule 2 The constraint reads as ‘The corresponding object is a count of the number of sections assigned to the faculty member, which has an upper limit of 3’ If a faculty member is already assigned 3 sections, any transaction that attempts to add another section will be rejected Business Rule 2: For a faculty member to be assigned to teach a section of a course, the faculty member must not be assigned to teach a total of more than three course sections In this case, the action assertion is an Corresponding object Upper LIMit Anchor object Action assertion Business rules in SQL You can implement business rules such as those above using the SQL language, and store the rules as part of the database definitions One way is to use the CREATE assertion statement, for Business rule 2 we could write: CREATE ASSERTION Overload_Protect CHECK (SELECT COUNT(*) FROM ASSIGNED WHERE Faculty_ID = ‘12345’) <= 3; Schema definition The internal schema of a relational database can be controlled for processing and storage efficiency. Techniques include Choosing to index primary and/or secondary keys to increase the speed of row selection, table joining, and row ordering. Dropping indexes to increase speed of table updating. Selecting file organizations for base tables that match the type of processing activity on those tables (e.g. keeping a table sorted by a frequently used reporting sort key) Schema definition Selecting file organizations for indexes (which are also tables) appropriate to the way the indexes are used - and allocating extra space for an index file so that an index can grow without having to be reorganised. Clustering data, so that related rows of frequently joined tables are stored close together in secondary storage to minimise retrieval time Maintaining statistics about tables and their indexes, so that the DBMS can find the most efficient ways to perform various database operations Creating indexes To speed up random/sequential access to base table data. Although users do not directly refer to indexes when writing SQL commands, the DBMS recognises which existing indexes would improve query performance. In this example an alphabetical index on the CUSTOMER_NAME field in the CUSTOMER_T table is created: CREATE INDEX NAME_IDX ON CUSTOMER_T(CUSTOMER_NAME); Creating indexes Indexes may be created or dropped at any time. If data already exists in the key column(s), index population will automatically occur for the existing data If an index is defined as UNIQUE (CREATE UNIQUE INDEX) and the existing data violate this condition, the index creation will fail Creating indexes Once an index is created, it will be updated as data are entered, updated or deleted When we no longer need an index we can use the DROP statement, for example to remove the index on the customer name in the CUSTOMER table: DROP INDEX NAME_IDX; The SELECT Statement Is used for queries on single or multiple tables. It has the following clauses: SELECT - Lists the columns (and expressions involving columns) from base tables or views to be projected into the table that will be the result of the command FROM - Identifies the table(s) or view(s) from which columns will be chosen to appear in the result table, and includes the tables or views needed to join tables to process the query WHERE - Includes the conditions for row selection within a single table or view, and the conditions between tables or views for joining SELECT The first two are required, and the third is necessary when only certain table rows are to be retrieved, or multiple tables are to be joined Following Fig. Shows how we can display product name and quantity from the PRODUCT view for all products costing less than $275 SELECT Example Find products with standard price less than $275 SELECT PRODUCT_NAME, STANDARD_PRICE FROM PRODUCT_V WHERE STANDARD_PRICE < 275 Comparison Operators in SQL Distinct and * If the user does not want to see duplicate rows in the result, SELECT DISTINCT can be used, so SELECT DISTINCT PRODUCT_NAME would display a results table without duplicate rows SELECT * (where * is a wildcard to indicate all columns) displays all columns from all the tables or views in the FROM clause Using expressions These are mathematical manipulations of the data in the table Some are stored functions, such as SUM or AVG e.g. what is the average standard price for each product in inventory (to be put in the result AVERAGE)? SELECT AVG (STANDARD_PRICE, AS AVERAGE FROM PRODUCT_V; Using functions Function such as COUNT, MIN, MAX, SUM an AVG of specified columns in the column list of a SELECT command may be used to specify that the resulting answer table is to contain aggregated data instead of row-level data. Using any of these aggregate functions will give a one-row answer. e.g. using the COUNT aggregate function to find totals, asking how many different items were ordered on order number 1004 SELECT COUNT(*) FROM ORDER_LINE_V WHERE ORDER_ID = 1004; Note: with aggregate functions you can’t have single-valued columns included in the SELECT clause Using functions It is easy to confuse the functions COUNT(*) and COUNT COUNT(*) counts all rows selected by a query, regardless of whether any of the rows contain null values COUNT tallies only those rows that contain a value, it ignores null values SUM and AVG can only be used with numeric columns COUNT, COUNT(*), MIN and MAX can be used with any data type. e.g. using MIN on a text column will find the lowest value in the column - the one which is closest to the beginning of the alphabet Using wildcards Wildcards can also be used in the WHERE clause if an exact match is not possible Here the keyword LIKE is paired with wildcard characters and usually a string containing the characters that are known to be the desired matches The wildcard character ‘%’ is used to represent any collection of characters e.g. using LIKE ‘%DESK’ when searching PRODUCT_DESCRIPTION will find all the different kinds of desks Using wildcards The underscore ,_, is used to represent exactly once character So using LIKE ‘_-drawer’ when searching PRODUCT_NAME will find any products with specified drawers, such as ‘3-drawer’, ‘5-drawer’ etc. Using Boolean operators AND, OR, and NOT Operators for customizing conditions in WHERE clause. NOT is evaluated first, then AND, then OR The following query lists product name, finish and unit price for all desks and all tables that cost more than $300 in the PRODUCT view: SELECT PRODUCT_DESCRIPTION, PRODUCT_FINISH, STANDARD_PRICE FROM PRODUCT_V WHERE (PRODUCT_DESCRIPTION LIKE ‘%Desk’ OR PRODUCT_DESCRIPTION LIKE ‘%Table’) AND UNIT_PRICE > 300; In and not in lists To match a list of values, use IN e.g. list all customers who live in the warmer American states: SELECT CUSTOMER_NAME, CITY, STATE FROM CUSTOMER_V WHERE STATE IN (‘FL’, ’TX’, ‘CA’, ‘HI’); Sorting results • ORDER BY - sorts the final result rows in ascending or descending order • GROUP BY - groups rows in an intermediate results table where the values in those rows are the same for one or more columns • HAVING - can only be used following GROUP BY and acts as a secondary WHERE clause, returning only those groups which meet a specified condition ORDER BY e.g. list customer, city and state for all customers in the CUSTOMER view whose address is in Florida, Texas, California or Hawaii. List the customers alphabetically by state, and alphabetically by customer within each state: i.e. sort the results first by STATE, and within a state by CUSTOMER_NAME: SELECT CUSTOMER_NAME, CITY, STATE FROM CUSTOMER_V WHERE STATE IN (‘FL’, ‘TX’, ‘CA’, ‘HI’) ORDER BY STATE, CUSTOMER_NAME; GROUP BY Particularly useful when paired with aggregate functions such as SUM or COUNT GROUP BY divides a table into subsets (by groups). Then an aggregate function can be used to provide summary information for that group Scalar aggregate: a single value returned from SQL query using an aggregate function Vector aggregate: multiple values returned from SQL query with aggregate function (via GROUP BY) GROUP BY e.g. count the number of customers with addresses in each state to which we deliver: SELECT STATE, COUNT(STATE) FROM CUSTOMER_V GROUP BY STATE; GROUP BY It is also possible to nest groups within groups, the same logic is used as when sorting multiple items e.g. count the number of customers with addresses in each city to which we deliver. List the cities by state: SELECT STATE, CITY, COUNT(CITY) FROM CUSTOMER_V GROUP BY STATE, CITY; In general, each column referenced in the SELECT statement must be referenced in the GROUP BY clause, unless the column is an argument for an aggregate function included in the SELECT clause HAVING The HAVING clause acts like a WHERE clause, but it identifies groups that meet a criterion, rather than rows. Therefore we usually see a HAVING clause following a GROUP BY e.g. find only states with more than one customer: SELECT STATE, COUNT(STATE) FROM CUSTOMER_V GROUP BY STATE HAVING COUNT(STATE) > 1; Using WHERE here would not work, because WHERE does not allow aggregates - also WHERE qualifies rows, whereas HAVING qualifies groups Using all 6 clauses e.g. list the product finish and average standard price for each finish for selected finishes where the average standard price is less than $750: SELECT PRODUCT_FINISH, AVG(STANDARD_PRICE) FROM PRODUCT_V WHERE PRODUCT_FINISH IN(‘Cherry’, ‘Natural Ash’, ‘Natural Maple’, ‘White Ash’) GROUP BY PRODUCT_FINISH HAVING AVG(STANDARD_PRICE) < 750 ORDER BY PRODUCT_FINISH; Using all 6 clauses The following Fig. Shows the order in which SQL processes the clauses of a statement Arrows indicate the paths that may or may not be followed Only the SELECT and FROM clauses are mandatory The processing order is different from the syntax order As each clause is processed an intermediate results table is produced that will be used for the next clause (users do not see the intermediate tables, only the final results) SQL statement processing order