An Analysis of Cardinality Constraints in Redundant Relationships James Dullea and Il-Yeol Song College of Information Science and Technology Drexel University Philadelphia, Pennsylvania 19104 Email: james.dullea@phl.boeing.com and songiy@post.drexel.edu Abstract In this paper, we present a complete analysis of redundant relationships in the entity-relationship model. Existing approaches use the concept of functional dependencies for identifying redundancy but ignore minimum cardinality constraints that carry important information about the structure of the model. Research literature on the topic is rare and usually is confined to the ‘Many to One’-‘mandatory participation’ case. Our approach differs from previous works in that we consider both maximum and minimum cardinality constraints to analyze the 4096 cases required to perform a complete study. Our approach first looks at the maximum cardinality constraints to develop a set of general rules to identify groups of trivial and ambiguous structures, and then we give greater consideration to the minimum cardinality constraints in those groups that require detailed investigation. With this approach we have provided a thorough pattern analysis of redundant relationships from both a structural and semantic view. The scope of this paper focuses on a complete and thorough analysis of a binary relationship redundant with respect to the composite of two binary relationships and establishes inferences that can extend this analysis to the more complex ‘n-relationship path’ case. We provide a complete set of heuristics for identifying redundant relationships that can be easily applied by data modelers and system analysis people. 1. Introduction Entity-relationship (ER) modeling [CHEN76] is the foundation of various analysis and design methodologies for the development of relational databases, object oriented databases, and object modeling. A key measure of success in the design of these models is that they afford the accurate storage of information without unnecessary redundancy. Redundancy exists in two forms, redundant data and redundant relationships. Data redundancy has received the majority of attention in both the everyday development of commercial databases and the research literature. The driving force in the development of design techniques and methodologies from the very beginnings of entity-relationship modeling and normalization were the reduction of data redundancy. Of less fame is the concept of a redundant relationship. Teorey, et al, [TEOR86] offers a short definition by stating that relationships that represent the same concept are considered to be redundant. They indicated that redundant relationships should be eliminated because they are “likely to result in unnormalized relations when transforming the model in relational schemas”. They use an example of a transitive dependency to make their argument. Figure 1 is a reproduction from the [TEOR86] article. STUDENT ATTENDS BELONGS_TO LOCATED_IN CLUB SCHOOL FIGURE 1: Transitive relationships (Figure 5 in the [TEOR86]) In this example the minimum cardinality (Participation) is mandatory for each relationship and the maximum cardinality is ‘Many To One’ between STUDENT and CLUB, between STUDENT and SCHOOL, and between CLUB and SCHOOL. [TEOR86] states that the relationship ATTENDS is redundant because there exists a transitive dependency between BELONGS_TO and LOCATED_IN. Briefly stated ‘if the CLUB determines the SCHOOL and the STUDENT determines the CLUB, then using this path the STUDENT can determine the SCHOOL. If this is true then it seems that the relationship ATTENDS is really unnecessary and therefore redundant. Segev, a year later, in a corrigenda to [TEOR86], challenged the redundancy of the ATTENDS relationship stating that there needs to be an additional semantic constraint that is required which confines the student to belonging to a CLUB that is associated only with the school they attend [SEGE87]. Both Teorey and Segev agree that identifying redundant relationships must be done very cautiously and strong consideration must be given to both structural consistency and semantic relevance. For two relationships to be redundant they must be structurally consistent, semantically related, and unambiguous. We believe that approaching the problem using occurrence diagrams supported by matrix algebraic techniques in our detailed analysis to verify our results will yield a set of useful heuristics for the complete analysis of redundant relationships. The inclusion of minimum and maximum cardinality constraints coupled with the concept of a semantic connection constraint viewed from both their relatedness and completeness is key to our analysis. It enriches the semantic information in the model necessary to make inferences about the redundancy of a relationship beyond a simple suspicion of redundancy. This paper is organized as follows. Section 2 introduces the notation and definitions used. Section 3 discusses previous research and their limitations, and indicates the approaches used to analyze the data. Section 4 explains the results of our analysis from both maximum and minimum cardinality constraints, and an informal development of the heuristics. Section 5 formalizes the rules for redundant relationships. Section 6 concludes and summarizes the results of the paper while presenting future research considerations. 2. Notation and Definitions 2.1 Notation For uniformity in presentation we must briefly define some terminologies that are key to the environment supporting redundancy. A data model can be represented by a diagram of entities interconnected by relationships. This connectivity represents the mapping of the associated entities’ instances in the relationship [TEOR94]. Teorey uses rectangles to represent entities connected by straight lines to diamonds representing relationships. The maximum cardinality constraint is indicated by the fill of the diamond (see Figure 1). Minimum cardinality is represented by an open circle placed on the line between the entity and the relationship to indicate optional or partial participation. The absence of an open circle indicates mandatory or total participation. Figure 1 is an example of an EntityRelationship (ER) diagram constructed in Teorey notation while Figure 2 is a similar diagram constructed using the Chen approach. Chen’s approach allows us to explicitly show both mandatory and optional cardinality constraints. Chen’s notation indicates maximum cardinality by placing an ‘M’or ‘1’for Many or One near the entity rectangle in the diagram. M STUDENT M ATTENDS RSC BELONGS_TO 1 RSS 1 LOCATED_IN RCS CLUB M SCHOOL 1 FIGURE 2: An Example of a Suspiciously Redundant Relationship (ATTENDS is Redundant) For minimum cardinality, Chen’s notation specifically indicates optional or mandatory participation by using a whitened or blackened circle, respectively, on the line between the entity and the relationship. There are slight differences between Entity-Relationship diagramming techniques. For a complete discussion see [SONG95] for a comprehensive comparison of various Entity-Relationship diagramming techniques. We introduced the problem using Teorey notation because it was the original diagram in [TEOR86], but we will switch to Chen notation because it relates better to Appendix Table 2. Throughout this paper we will use Cmax as an abbreviation for maximum cardinality and Cmin for minimum cardinality. 2.2 Paths and Composite Relationships An ER diagram is made up of a path of alternating entities and relationships. We can define a subpath as a series of alternating entities and relationships starting with an initiating entity and ending with a terminating entity. If a path has the capability of starting with and ending with the same entity then it is called a cyclic path. Figure 2 is an example of a cyclic path. In a data model the use of Cmin and Cmax in a binary relationship represents a specific semantic meaning between two participating entities. There exists the case where the coupling of more than one binary relationships taken as a whole represents a specific semantic relationship greater than what is communicated by each single binary relationship. We call this a composite relationship and it is an implied relationship between the two outermost entities that exist in the composite of two or more contiguous relationships that convey an additional meaning when taken together. In Figure 2, the composite of relationships STUDENT/CLUB and CLUB/SCHOOL can be taken together as the composite relationship STUDENT/CLUB/SCHOOL and infers additional information about Student and School. Figure 12 shows the composite of many relationships within a subpath, we define this as an ‘n-relationship path’. 2.3 Semantic Connection Constraint In order for this inference to take place there must exist an additional semantic constraint imposed by the model with respect to the composite relationship. We call this constraint a semantic connection constraint and define it as a restriction on the mapping of data instances in the intermediate entities that forces them to carry transitive semantic information between the two outermost entities. In Figure 2, the assumption that the club of which the student belongs must be a club associated with the student’s school in the composite relationship STUDENT/CLUB/SCHOOL would be a semantic connection constraint and it could communicate additional information about the student and the school beyond just STUDENT/CLUB and CLUB/SCHOOL information. If the semantic connection constraint was missing we could make no inference about the student’s school through the club. In order for a semantic connection constraint to have a transitive property the component relationships must be semantically related, unambiguous, and complete. 2.4 Semantically Related, Unambiguous, Completeness We define a semantic connection constraint to be semantically related when the constraint establishes an association between two contiguous relationships with the entity between them acting as a surrogate and carrying sufficient linking information that establishes a connection between the outermost entities. In our example of the composite relationship STUDENT/CLUB/SCHOOL, the constraint on STUDENT/CLUB stating that each student must belong to one and only one club coupled with the constraint on CLUB/SCHOOL that a club is associated with one and only one school is not sufficient to establish a related semantic connection constraint. With only this constraining information, a student could possibly be paired with a club outside the school they attend. This lack of semantically related constraining information was the basis of [SEGE87] corrigenda on [TEOR86]. Only when an additional semantic constraint is imposed on the composite relationship that associates the STUDENT/CLUB relationship with the CLUB/SCHOOL relationship is the semantic connection related. Jones and Song addressed a similar issue in [JONE96] concerning the relatedness of binary relationships within a ternary relationship. They show there can exist multiple binary relationships, some of which are related to and supply additional semantic information about the ternary relationship. The concept of relatedness alone does not allow us to analyze the redundant relationship sufficiently. The model must exhibit no ambiguity in its connectivity across the relational path. For example, if two schools each have a chess club, it would be unclear as to how to associate a student to a unique school knowing only that the student belongs to a chess club. The semantic connection constraint must also be complete. A relationship is complete when all the data instances being modeled by the connecting constraint are passed between intermediate entities sufficiently to conceptually represent the single composite relationship between the two outermost entities. If in our previous example, only some of the students were required to belong to a club, then we could not infer any association between all students and the school they attend. The composite relationship would be related but incomplete. The participation across all relationships does not have to be mandatory for the semantic connection constraint to complete; it only has to be sufficient. In Figure 2, only some of the clubs are associated with the students, but the semantic connection constraint is still complete because it sufficiently allows all of the student instances to be associated with their schools 2.5 Redundant Relationship We now have laid sufficient foundations to develop a definition of redundant relationships and identify the conditions necessary for the redundancy to exist. As mention earlier [TEOR86] states that two relationships that represent the same concept are considered to be redundant. We amplify the word “relationships” to include both binary relationships and composite relationships. In a binary relationship the connectivity between the two entities is related, unambiguous, and complete by the very nature of the connecting relationship. In a composite relationship the overall transitive connectivity between the two outermost entities must be established through a semantic connection constraint that is related, unambiguous, and complete. For two relationships to represent the same concept, the mappings of the data instances between the outermost entities must be identical. A single binary relationship occurring in a cyclic path is defined as a redundant relationship if there exists a composite relationship that completes the cyclic path and represents the same concept through a semantic connection constraint that is semantically related, structurally unambiguous, and sufficiently complete. Figure 2 is an example of a cyclic path containing a redundant relationship if the constraint that ‘all Students were require to belong to a Club that is associated only with the student’s School’ is imposed upon the composite relationship. 3. Approach 3.1 Existing Approaches Research on the thorough analysis of redundant relationships is infrequently found in the literature. The available research depends heavily on semantic information, the functional dependency of data items and Armstrong’s transitive rule to determine redundancy [AZAR86] [ORLO90] [WU92]. [AZAR86] presents an algorithm using functional dependencies, join dependency components, and inclusion dependencies to identify data and relationship redundancy. The algorithm uses renaming procedures transforming local properties into universal attributes while introducing inclusion dependencies to derive redundant items. [ORLO90] uses a natural language interpreter to collect candidates for ‘elementary fact types’ of semantic constraints. Elementary fact types are related to data dependencies in order to develop functional dependencies. They use a concept called a ‘derived fact type’ that is derived from the elementary fact types and corresponding functional dependencies that are redundant. [WU92] uses a vector approach to analyze redundancy. Relationships are expressed as vectors whose components are values of ones and zeroes depending on the functional dependencies of the candidate key in the connecting entities. The product resultant of the vectors is compared against the suspicious redundant relationship vector to determine possible redundancy. 3.2 Limitations of Existing Approaches The above three approaches depend heavily on the use of functional dependencies to identify redundant relationships. There are two reasons why we believe that further exploration in this area is appropriate. First functional dependencies do not take into consideration minimum cardinality constraints. Although these constraints are still available in the semantic information, they become difficult to discern where multiple relationships exist in the relationship path. Second, there is no evidence that the use of functional dependency methods can be applied to the analysis of composite relationships. We also feel that these previous methods and the current literature do not address the more complicated redundancy issues, such as the Many To Many scenarios, and they are not readily transferable to other modeling techniques, such as object-oriented modeling. We believe that the redundant relationship paradigm has not been fully explored, and that both minimum cardinality and maximum cardinality play an important role in identifying redundancies. In order to explain our concept we will introduce a generic example as shown in Figure 3, similar to the [TEOR86] example, that will be drawn upon throughout this paper. A RAB B RAC RBC C FIGURE 3: A generic example (cardinality constraints not shown) The center of our analysis will be the examination of the possible redundancy of relationship RAC with respect to composite relationship RABC. We will assume for our analysis that relationship RAB is semantically related to relationship RBC through entity B. This means that for each occurrence of Entity A associated with an occurrence in Entity B there exists an association with an occurrence in Entity C through Entity B that carries connectivity information from the occurrence in Entity A. It has been previously stated that without RAB and RBC being semantically related the question of redundancy between relationships RAC and RABC is enervated and thus further analysis is not required. With that assumption in mind our focus will be on the ambiguity and completeness of both relationships (RAC and RABC) with respect to maximum and minimum cardinality constraints. 4. Analysis In our simple ABC example there are 4096 different combinations based on both Cmax and Cmin. There are four variations of Cmax (1:1, M:1, 1:M, and M:N) by three relationships yielding 43 (or 64) possible combinations. Appendix Table 1 shows the 64 Cmax combinations for the generic example in Figure 3. We will refer to each entry in Appendix Table 1 as Cmax Group 1 through 64. Cmin has two possible variations (mandatory participation or optional participation) by two per each of the three relationships giving forth 26 (or 64) possible combinations. Cmax and Cmin taken together yields 64 times 64 equaling 4096 combinations. Our first objective was to identify trivial patterns at the Cmax level to reduce the number of combinations to a manageable task. Howe identifies a connection trap with respect to maximum cardinality constraints that he calls ‘the fan trap’[HOWE89]. 4.1 The Fan Rule (FAN) The fan trap exists when a composite relationship contains an intermediate entity that contains two opposing ‘M:1’ cardinality constraints. The two relationships fan out (M:1-1:M) with respect to the maximum cardinality constraint from the intermediate entity. An example of a fan relationship is shown in Figure 4 where the diagram represents ‘many employees can belong to one department and a department has the responsibility for many projects’. RED EMPLOYEE M RDP DEPARTMENT 1 1 PROJECT M FIGURE 4: An example of a FAN relationship A quick look at this composite relationship might lead one to infer a relationship can be developed between an employee and a project but applying this structure no inference can be made. Although there is a connectivity between them, an occurrence of an employee with a department does not uniquely identify a project occurrence because department does not relate employee to a project. The present of a fan relationship in a path renders the path ambiguous with respect to inferring connectivity between outermost entities. We therefore can make the statement that if fan relationship exists in either path then no inference can be made about redundancy of the relationships. A ‘Many to Many’ relationship between two entities can be decomposed into two ‘Many to 1’ relationships [HOWE89]. The mechanics are quite simple and many references are available [BRUC92] [HOWE89] [SHEP90]. Briefly the relationship between the two original entities is converted into a surrogate entity with a ‘MANY’ cardinality on each side. The original entities are connect with the surrogate entity with ‘ONE’ cardinality constraints. Since the decomposition of the M:N relationship yields 1:M and M:1 relationships, any connectivity with a 1:M relationship on the ‘ONE’ side will result in a fan relationship. The fan rule states that if a fan relationship or ‘a M:N relationship coupled with a M:1 relationship on the ONE side’ exists in either path then no inference can be made about the redundancy of the relationships. Applying the fan rule coupled with M:N decomposition, the following Cmax Groups 10, 12, 14, 16, 26, 28, 30, 32, 42, 44, 46, 48, 58, 60, 62, and 64 from Appendix Table 1, can be eliminated from redundancy consideration because of ambiguity. 4.2 The Many to Many Rule (MMR) Taking a closer look at ‘Many to Many’ relationships we came to the conclusion that any paths containing a M:N relationship is ambiguous with three trivial exceptions. First, if relationship RAC is M:N and relationship RABC represents the decomposition of RAC (RAB being 1:M and RBC being M:1) then relationship RAC obviously would be redundant to RABC. Second, if relationship RAC is M:N and relationship RAB is 1:1 with relationship RBC being M:N and identical to RAC, then relationship RAC is obviously redundant to relationship RABC. Third, is similar to the second special case with relationship RAB being M:N and relationship RBC being 1:1. In these three special cases the Cmin must be mandatory participation for all relationships otherwise they would fall into the ambiguous category. We concluded that a path containing a ‘Many to Many’ relationship is ambiguous with respect to our redundancy analysis because there is no functional dependency constraint between the two entities [JONE96] [TEOR86]. Without some functional dependency the question of redundancy could only be resolved at the data instance level. This would means that the state of redundancy could change on a per update basis and would be dependent on which courses the students picked. Our objective is identify a stable set of consistent heuristics concerning redundancy irrespective of the data instances. We therefore can define the Many-to-Many Rule stating that if a ‘Many to Many’ relationship exists in either path then no inference can be made about redundancy of the relationships. Applying the ‘Many to Many’ rule Cmax Groups 4, 13, 14*, 15, 16*, 20, 28, 29, 30*, 31, 32*, 36, 39, 40, 45, 46*, 47, 48*, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58*, 59, 60*, 61, 62*, 63, and 64* from Appendix Table 1, can be eliminated from redundancy consideration because of ambiguity ( * indicates also eliminated by the fan rule). 4.3 The Directional Rule (DIR) 4.3.1 Directional Constraints (DIR1-Expanding and Contracting) We define a directional constraint as both relational paths must be expanding or contracting in the same direction with respect to Cmax. For example, in Figure 3, if relationship RAC has the cardinality of M:1, then we could consider that the composite relationship RABC is a suspiciously redundant relationship if the Cmax for RABC is ‘M:1-M:1’, ‘M:1-1:1’ or ‘1:1-M:1’. Both relationships would be contracting. On the other hand, if relationship RABC had a cardinality of ‘1:M-1:M’, ‘1:M-1:1’, or ‘1:1-1:M’ it would be expanding and redundancy would be impossible. Since the maximum cardinality constraint only addresses occurrences that participate in the relationships we can make directional constraint inferences with respect to the initiating entity or the terminating entity. At this point in our analysis we can not make inferences about intermediate entities because the minimum cardinality constraint will play an important role in the connectivity between the outermost entities. We will address minimum cardinality later in this section. For now we will confine our rule to on initiating and terminating entities. We define the directional rule (DIR1) as two paths to be suspiciously redundant the Cmax on the initiating and terminating entities of each path must be either expanding or contracting in the same direction, or they must at least be constant (1:1) for a redundant relationship to be possible. 4.3.2 Directional Constraints (DIR2-Constant) In our definition of directional constraint we left out the possibility of a constant Cmax. We did this on purpose as not to confuse expanding and contracting concepts with what we consider to be a special case. When combined with a M:1 relationship in a relationship path, a constant relationship has a neutral effect if the participation is mandatory or a more constraining effect if the participation is optional. We will address these issues in the second half of this section. What we need to address is the situation of relationship RAC being ‘One to One’. Again we can only make inference about the initiating and terminating entities. We can state a corollary to the above directional rule that applies when one path has a 1:1 cardinality constraint (or a series of 1:1 cardinality constraints). The directional rule (DIR2) states that if two paths are to be suspiciously redundant with one path having 1:1 cardinality then the Cmax in the other path must connect with the initiating and terminating entities on their ‘ONE’ sides. If the ‘Many’ side was connected to the initiating or terminating entity then we would have either an expanding or contracting Cmax and violate the previous directional rule (DIR1). It is acceptable to have ‘Many’ constraint on an intermediate entity if and only if the Cmin is optional. Applying the Directional Rule, Cmax Groups 2, 6, 7, 8, 9, 10*, 11, 12*, 17, 18, 21, 22, 23, 24, 33, 35, 32*, 36, 41, 42*, 43, and 44* can be eliminated from redundancy consideration because of ambiguity ( * indicates eliminated by the fan rule). 4.4 Summary of Cmax Analysis Applying the Fan, Many-to-Many, and Directional Rules, we were able to identify that the following Cmax Groups were ambiguous and thus we are able to eliminate them from redundancy consideration. They are 2, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 22, 23, 24, 26, 28, 29, 30, 31, 32, 33, 35, 32, 36, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, and 64 from Appendix Table 1. Also, in our consideration, groups 5, 34, 37, and 38 are the mirror images of groups 3, 25, 19, and 27, respectively. These four groups can be eliminated from our Cmin analysis because they will yield the same results and be governed by the same rules as their mirror images. This leaves us with five groups to be explored. They are groups 1, 3, 19, 25, and 27. We believed that group 27 (shown in Figure 5) where the relationship RAC is M:1, AB is M:1, and RBC is M:1 was the most important and the other four groups were either special cases or subsets of group 27. A (Entity A) was partially participating in one subpath, it necessitated that in the other subpath either Cmin for the 1 B M RBC 1 C FIGURE 5: ABC example for group 27 (maximum cardinality constraints shown) 4.5 Cmin Analysis of Group 27 (M:1-M:1-M:1 Cases) As stated before there are 64 different Cmin possibilities. Appendix Table 2 shows the 64 cases to be analyzed in this section and Figure 5 shows our ABC example for group 27 with the cardinality constraints. The results of the analysis is presented in Appendix Table 2, each Cmin case is numbered from 1 to 64 to allow referencing. For each of the Cmin 64 cases we developed an occurrence diagram [ELMA94] to assist us in examining the structural constraints on the relationships. In analyzing the patterns, we found some phenomena of interest that required further research. A driving factor in the results was the concept of an ‘Initiating Entity’ and a ‘Terminating Entity’. The term initiating and terminating are arbitrary according to how the diagram is viewed but relative only to each other for analysis purposes. In our ABC example of Cmax group 27, we found 23 of the 64 cases to be redundant. Examining both the group of 23 redundant cases and the 41 non-redundant cases, we found three structural patterns that allowed us to determine relationship redundancy status for Cmax group 27. First, we found a rule that applied to the ‘Initiating Entity’. The Initiating Entity (IER1) states when the Cmin of the ‘Initiating Entity’ (Entity A) had mandatory participation in one subpath it required the Cmin to be mandatory in the other subpath. Figure 6 shows a set of relationships that meet the Initiating Entity Rule (IER1) while Figure 7 shows a set of relationships that violate the rule. In Figure 7c the relationships are not transitive even though they are related because they are not complete with respect to student A6 which does not participate in RAB. As a corollary to the IER1, we also recognized that when the ‘Initiating Entity’ RBC B M 1 RAC M M C1 C2 T RBC RAB M A1 A2 A3 A4 A5 A6 1 T T FIGURE 7b: An Occurrence Diagram Showing All Tuples (A1 thru A6) of the Student Entity Participates in the Relationship RAC RBC B1 B2 B3 T C2 T M T 1 C1 T 1 RAC M A1 A2 A3 A4 A5 A6 1 C2 RAB C 1 FIGURE 7a: An Example of A Relationship Structure That Violates the Initiating Entity Rule C1 T RBC B M FIGURE 6b: An occurrence diagram showing all tuples (A1 thru A6) of the Student Entity participates in the Relationship RAC A1 A2 A3 A4 A5 A6 1 1 C FIGURE 6a: An example a Relationship Structure That Meets All Three Rules (RAC is Redundant) A1 A2 A3 A4 A5 A6 RAC RAB 1 RAC 1 RAC RAB M 1 M A M M RAB M A M 1 1 M B1 B2 B3 C2 T T P C1 T FIGURE 7c: An Occurrence Diagram showing a FIGURE 6c: An Occurrence Diagram related but Incomplete transitive relationships showing a related and complete transitive between enties A/B AND B/C from Figure 7a relationship between entities A/B and B/C one tuple in entity A (A6) does not participate in from Figure 6a the Relationship RAB initiating entity needed to be partial or the Cmin for Entity B (an intermediate entity) on the ‘MANY’ side needed to be partial. We called this IER2. Figure 8 shows a set of relationships that meet the Initiating Entity Rule (IER2) while Figure 9 shows a set of relationships that violate the rule. In Figure 9b the relationship RAC is unrelated to the composite relationship RAB and RBC shown in Figure 9c. They are unrelated because Figure 9b indicates that tuple A6 is not associated with entity C while Figure 9c forces a relationship through its association with entity B. The relationship RAC in this case is not redundant. A M A M M M RAC RAB RBC B M 1 C FIGURE 8a: An Example of a Relationship Structure That Meets All Three Rules (RAC is Redundant) A1 M A2 A3 A4 A5 A6 P RAC RAB 1 1 1 1 RBC B M RAC 1 C1 C2 T FIGURE 8b: An Occurrence Diagram Showing Only Tuples (A1 thru A5) of Entity A Participates in the Relationship RAC 1 C FIGURE 9a: An Example of a Relationship Structure That Violates the Initiating Entity Rule A1 M A2 A3 A4 A5 A6 P RAC 1 C1 C2 T FIGURE 9b: An Occurrence Diagram Showing Only Tuples (A1 thru A5) of Entity A Participates in the Relationship RAC We also tested this rule for the generalized ‘n-relationship path’ using occurrence matrices and comparing the resultant matrix of each path, and found it to be true for mandatory participation. It was also true for the partial participation cases and required at least the initiating entity to be partial or at least one of the N-2 intermediate entities to be partial. M A1 A2 A3 A4 A5 A6 RBC RAB 1 M B1 B2 B3 B4 T A1 A2 A3 A4 A5 A6 C1 C2 P T M 1 T FIGURE 8c: An Occurrence Diagram showing a related and complete transitive relationship between A/B and B/C from Figure 8a. Complete in that A's tuples (A1 thru A5) are mapped to their respective C's tuples (C1 and C2) with respect to Figure 9b. RBC RAB 1 B1 B2 B3 B4 M C1 B FIGURE 9c: An Occurrence Diagram showing a unrelated transitive relationships between A/B and B/C with respect to Figure 9b. A6 Is forced to participate in the relationship R AB with Entity B and because of the total participation constraint of Entity B with Entity C, Entity A is forced to be associated with Entity C. The two subpaths have unrelated semantic meanings and are not redundant. M M RAC RAB 1 1 RBC B C 1 M FIGURE 10a An example of a relationship structure that violates the Terminating Entity Rule RAC A1 M A2 A3 A4 A5 A6 T M 1 C1 C2 C3 P FIGURE 10b: An Occurrence Diagram showing only tuples (C1 and C2) of the Entity C participating in the relationship RAC A1 A2 A3 A4 A5 A6 RBC RAB 1 M 1 B1 B2 B3 B4 T 1 1 T Second, we also found a similar rule for the ‘Terminating Entity’. We of course called it the Terminating Entity Rule (TER1). It states when the Cmin of the ‘Terminating Entity’ has mandatory participation in one subpath it requires the Cmin to be mandatory in the other subpath. Again we recognized that when the ‘Terminating Entity’ was partially participating in one subpath, it necessitated that in the other subpath either Cmin for the terminating entity needed to be partial or the Cmin for the intermediate entity on the ‘ONE’ side needed to be partial (TER2). Figure 10 shows a set of relationships that violate the Terminating Entity Rule. We again tested this rule for the generalized ‘n-relationship path’ and found it to be true for mandatory participation, and true for the partial participation group requiring at least the terminating entity to be partial or at least one of the N-2 intermediate entities to be partial. A T C1 C2 C3 T RAC RAB C2 T P T M A 1 M T FIGURE 10c: An Occurrence Diagram showing that at least one of the tuples must participate with Entity C's C3 because of the Total Participation Constraint on RAB and RBC. This is not consistent with Figure 10b. Of the 41 combinations that failed to be redundant using both the occurrence matrix method and drawing the occurrence diagrams, only two did not meet either the “Initiating Entity Rule” or the “Terminating Entity Rule”. They were case 9 and 29 (see Appendix 2). Further analysis identified that if the Cmin of one of the intermediate entities was mandatory on the ‘ONE’ side and partial on the ‘MANY’ side then the Cmin of the ‘Initiating Entity’ of the opposing path must be partial. This concept remained consistent with the other 39 cases. We call the application of this concept the Intermediate Entity Rule (MER). Figure 11 shows a set of relationships that violate the Intermediate Entity Rule. It also was expandable to an ‘n-relationship path’ diagram with one additional requirement. In the ‘n-relationship-path’ model all the intermediate entities needed to be mandatory on the ‘ONE’ side. RAB M C 1 FIGURE 11a An example of a relationship structure that violates the Intermediate Entity Rule A1 M A2 A3 A4 A5 A6 T RAC M 1 A1 A2 A3 A4 A5 A6 C1 C2 T RBC RAB 1 1 M B1 B2 B3 B4 T C1 C2 P T T FIGURE 11c: An Occurrence Diagram showing that the transitivity of A6 in not complete because of the Participation Constraint onRBC. FIGURE 11b: An Occurrence Diagram showing the relationship RAC in Figure 11a. 4.6 Cmin Analysis of Cmax Groups 1, 3, 19, and 25 Cmax groups 19 and 25 contains at least one ‘Many to One’ cardinality constraint in both of the relationship paths. We found that the rules developed from our analysis of Cmax group 27 were consistent with Cmax groups 19 and 25 without any modification even with the introduction of the ‘One to One’ constraint. In Cmax groups 1 and 3 we found the case to be different. When one or both paths are comprised of a ‘One to One’ cardinality constraint (or a series of ‘One to One’ constraints) it introduces additional restrictions on the connectivity. We found that the rules for Cmax group 27 were still consistent for groups 1 and 3 with the following additional constraints. For Cmax group 1, the structures presented in cases 2 and 36 could not yield a redundant relationship. For Cmax group 3, the structures presented in cases 1, 21, 35, 41, 43, 55, 61, and 63 also could not yield a redundant relationship. This required the additional development of a rule when a ‘One to One’ cardinality constraint exists in at least one path. We call this the One-to-One Rule (11R1). With respect to Cmax group 1 we found that if one path is mandatory on the terminating entity side then in the other path the intermediate entity’s cardinality constraint could not be optional on the initiating entity’s side and mandatory on the terminating entity’s side. With respect to Cmax group 3 we found that in the path containing the M:1 cardinality that the initiating entity’s ‘One to One’ cardinality constraint on the intermediate entity side must be optional for a redundant relationship to exist (11R2). 5. Rules for Analyzing Redundant Relationships The above analysis led to the development of a set of heuristics for two paths contained within a cyclic path (starting with E1 and ending with EN ,with E2....N-1 between them on only one of the paths. In order for two paths (Path X and Path Y) to be redundant to each other they must meet all of the following cardinality constraint rules: E1 M M R12 P A T H Y 1 PATH X E2 R1N M R23 1 1 EN-1 M R(N-1)N 1 EN FIGURE 12: An Example of a Redundant 'N-Relationship Path' Rules Dealing with Maximum Cardinality The Fan Rule (FAN) FAN If a fan relationship or ‘a M:N relationship coupled with a M:1 relationship on the ONE side’ exists in either path then no inference can be made about the redundancy of one path to the other. The Many-to-Many Rule (MMR) MMR If a non-trivial ‘Many to Many’ relationship exists in either path then no inference can be made about redundancy of one path to the other. The Directional Rule (DIR) DIR1 The maximum cardinality constraints on the initiating and terminating entities of each path must be in the same direction, or they must at least be constant (1:1) for a redundant relationship to be possible. DIR2 If two paths are suspiciously redundant with one path having 1:1 cardinality then the Cmax in the other path must connect with the initiating and terminating entities on their ‘ONE’ sides. Rules Dealing with Minimum Cardinality Initiating Entity Rule (IER) IER1 If the Cmin associated with E1 for Path X is 1 (mandatory participation) then the Cmin associated with E1 for Path Y must also be 1. IER2 If the Cmin associated with E1 for Path X is 0 (partial participation) then in Path Y either the Cmin of E1 must be 0 or the Cmin of at least one E2...N-1 on the “MANY” side must be 0. Terminating Entity Rule (TER) TER1 If the Cmin associated with EN for Path X is 1 (mandatory participation) then the Cmin associated with EN for Path Y must also be 1. TER2 If the Cmin associated with EN for Path X is 0 (partial participation) then in Path Y either the Cmin of EN must be 0 or the Cmin of E2...N-1 on the “ONE” side must be 0. Intermediate Entity Rule (MER) MER If the Cmin associated with all E2...N-1 is 1 (mandatory participation) on the “ONE” side and at least one of the Cmin associated with the any E2...N-1 is 0 (partial participation) on the “MANY” side then the Cmin of E1 must be 0 (partial participation) in Path X. One-to-One Rule (11R) 11R1 If both paths are a series of ‘One to One’ maximum cardinality constraints and one path is mandatory on the terminating entity side then in the other path any intermediate entity’s cardinality constraint can not be optional on the initiating entity’s side and mandatory on the terminating entity’s side. 11R2 If only one path is a series of ‘One to One’ maximum cardinality constraints then in the path containing the ‘Many to One’ cardinality the initiating entity’s ‘One to One’ cardinality constraint on the intermediate entity side must be partial. 6. Conclusion Entity-relationship diagramming has been the engineering foundation methodology of data modeling for over twenty years. During that period of time published research literature on the topic of redundant relationships are very rare, not to mention the absence of complete cardinality analysis and a set of heuristic rules to guide analysts in real world database modeling and design. We have performed a complete analysis of all cardinality constraints and developed eleven heuristic rules for deciding the redundancy of two relationship paths. The advantages of our method are that the heuristic rules can be easily and visually applied to all entity relationship diagrams for deciding redundancy and they are complete in that they address all possible combinations of minimum and maximum cardinality constraints. Our approach of using occurrence diagrams is easily understood, repeatable, and independent of any data instances. We feel that previous methods relying only on functional dependency analysis were cumbersome and confusing to apply to the data model and ignored the minimum cardinality constraint allowing only for a conclusion that was suspiciously redundant. The set of heuristic rules provided from our analysis of redundant relationships is complete and consistent for all cases, and readily applicable by data modelers and system analysts. We believe that this analysis is a major step forward in the analysis of redundant relationships and provides an adequate foundation for further work with the entity-relationship model in the area of composite relationships, the inclusion of ternary relationships in the model, analysis of composite paths, and the existence of multiple sets of redundant relationship in a single cyclic path. Also, the analysis can be readily applied to objectoriented modeling where most of the research emphasis is being focused. Analysis of redundancy between classes with many different modeling constructs in the object model are new areas yet to be explored. References [AZAR86] Azar, N. and E. Pichat,1986 “Translation of an extended entity-relationship model into the universal relation with inclusions formalism”, Entity-Relationship Approach: Ten Years of Experience in Information Modeling. Proceedings of the Fifth International Conference, pp.253-60, Nov. 17-19, 1986. [BRUC92] Bruce, Thomas A., 1992. Designing Quality Databases with IDEF1X Information Models, Dorset House Publishing, NY [CHEN76] Chen, Peter, 1976. “The Entity-Relationship Model -Toward a Unified View of Data”, ACM Transactions on Database Systems, 1(1)9-36, March 1976. [ELMA94] Elmasri, Ramez and Shamkant B. Navathe, 1994. Fundamentals of Database Systems, 2nd Ed., The Benjamin/Cummings Publishing Co, Inc., Redwood City, CA. [HOWE89] Howe, D. R., 1989. Data Analysis for Data Base Design, 2nd Ed., Edward Arnold, London, GB. [JONE96] Jones, Trevor H., and Il-Yeol Song, 1996. “Analysis of Binary/Ternary Cardinality Combinations in Entity-Relationship Modeling”, Data & Knowledge Engineering, 19(1996)39-64. [ORLO90] Orlowska, M.E.and Zhang Yanchun, 1990. “On enhancements of semantic methodologies for relational database design”, Databases in the 1990s. Proceedings of the Australian Database Research Conference, pp. 97-108, Feb. 6, 1990. [SEGE87] Segev, Arie, 1987. “Transitive Dependencies”: in Surveyors’Forum, Computing Surveys, 19(2)191-193. [SHEP90] Shepherd John C., 1990. Database Management, Theory and Application, Richard D. Irwin, Inc., Boston, MA. [SONG95] Song, Il-Yeol, Mary Evans, and E. K. Park, 1995. “A Comparative Analysis of Entity-Relationship Diagrams”, Journal of Computer & Software Engineering, 3(4)427-459. [TEOR86] Teorey, Toby J., Dongqing Yang, and James P. Fry, 1986. “A Logical Design Methodology for Relational Databases Using the Extended Entity-Relationship Model”, Computing Surveys, 18(2)197222, June, 1986. [TEOR94] Teorey, Toby J., 1994. Database Modeling and Design - The Entity-Relationship Approach , Morgan Kaufmann Publishers, Inc., San Mateo, CA. [WU92] Wu, J.Y.J., 1992. “A data modeling approach with E-R table for the representation of enterprise system”, International Journal on Information and Management Sciences , 3(1)79-100, June,1992. Appendix Table 1 : Maximum Cardinality Constraints for the ABC model and the rules applied to show ambiguity. Appendix Table 2 : Minimum Cardinality Constraints for the ABC model and the rules applied to show structural redundancy for Group 27 (M:1, M:1, M:1) Cmax Group 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 RAC 1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 M-1 M-1 M-1 M-1 M-1 M-1 M-1 M-1 M-1 M-1 M-1 M-1 M-1 M-1 M-1 M-1 1-M 1-M 1-M 1-M 1-M 1-M 1-M 1-M 1-M 1-M 1-M 1-M 1-M 1-M 1-M 1-M M-N M-N M-N M-N M-N M-N M-N M-N M-N M-N M-N M-N M-N M-N M-N M-N RAB 1-1 1-1 1-1 1-1 1-M 1-M 1-M 1-M M-1 M-1 M-1 M-1 M-N M-N M-N M-N 1-1 1-1 1-1 1-1 1-M 1-M 1-M 1-M M-1 M-1 M-1 M-1 M-N M-N M-N M-N 1-1 1-1 1-1 1-1 1-M 1-M 1-M 1-M M-1 M-1 M-1 M-1 M-N M-N M-N M-N 1-1 1-1 1-1 1-1 1-M 1-M 1-M 1-M M-1 M-1 M-1 M-1 M-N M-N M-N M-N RBC 1-1 1-M M-1 M-N 1-1 1-M M-1 M-N 1-1 1-M M-1 M-N 1-1 1-M M-1 M-N 1-1 1-M M-1 M-N 1-1 1-M M-1 M-N 1-1 1-M M-1 M-N 1-1 1-M M-1 M-N 1-1 1-M M-1 M-N 1-1 1-M M-1 M-N 1-1 1-M M-1 M-N 1-1 1-M M-1 M-N 1-1 1-M M-1 M-N 1-1 1-M M-1 M-N 1-1 1-M M-1 M-N 1-1 1-M M-1 M-N Rules applied to show ambiguity Unambiguous Directional Rule Unambiguous Many-to-Many Rule Unambiguous Directional Rule Directional Rule Directional Rule Directional Rule Fan Rule, Directional Rule Directional Rule Fan Rule, Directional Rule Many-to-Many Rule Fan Rule, Many-to-Many Rule Many-to-Many Rule Fan Rule, Many-to-Many Rule Directional Rule Directional Rule Unambiguous Many-to-Many Rule Directional Rule Directional Rule Directional Rule Directional Rule Unambiguous Fan Rule Unambiguous Fan Rule Many-to-Many Rule Fan Rule, Many-to-Many Rule Many-to-Many Rule Fan Rule, Many-to-Many Rule Directional Rule Unambiguous, Mirror Image of 25 Directional Rule Many-to-Many Rule Unambiguous, Mirror Image of 19 Unambiguous, Mirror Image of 27 Many-to-Many Rule Many-to-Many Rule Directional Rule Fan Rule, Directional Rule Directional Rule Fan Rule, Directional Rule Many-to-Many Rule Fan Rule, Many-to-Many Rule Many-to-Many Rule Fan Rule, Many-to-Many Rule Many-to-Many Rule Many-to-Many Rule Many-to-Many Rule Many-to-Many Rule Many-to-Many Rule Many-to-Many Rule Many-to-Many Rule Many-to-Many Rule Many-to-Many Rule Fan Rule, Many-to-Many Rule Many-to-Many Rule Fan Rule, Many-to-Many Rule Many-to-Many Rule Fan Rule, Many-to-Many Rule Many-to-Many Rule Fan Rule, Many-to-Many Rule Cmin Case 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 RAC M - 1 RAB M - 1 RBC M - 1 Ý Ý Ý Ý Ý Ý Ý Ý Ý o Ý Ý Ý Ý o Ý Ý Ý Ý Ý o o Ý Ý Ý Ý Ý Ý Ý o Ý Ý Ý o Ý o Ý Ý o Ý Ý o Ý Ý o o Ý o Ý Ý Ý Ý o Ý Ý Ý Ý o o Ý Ý Ý o Ý o Ý Ý Ý o o o Ý Ý Ý Ý Ý o o Ý Ý Ý o o o Ý Ý o Ý o o Ý Ý o o o o Ý o Ý Ý Ý Ý Ý o Ý o Ý Ý Ý o o Ý Ý Ý Ý o o o Ý Ý Ý o Ý Ý Ý o Ý o Ý o Ý o Ý o o Ý Ý o Ý o o o Ý o Ý o Ý Ý o Ý Ý o Ý o o Ý Ý o o Ý o Ý Ý o o o o Ý Ý o Ý Ý o o Ý o Ý o o o Ý o o Ý o o Ý o o o o o o Ý Ý Ý Ý Ý o Ý Ý o Ý Ý o Ý o Ý Ý Ý o Ý o o Ý Ý o Ý Ý Ý Ý o o Ý Ý o Ý o o Ý o Ý Ý o o Ý o o Ý o o Ý Ý Ý o Ý o Ý Ý o o Ý o Ý o Ý o Ý o Ý o o o Ý o Ý Ý Ý o o o Ý Ý o o o o Ý o Ý o o o Ý o o o o o o Ý Ý Ý Ý o o Ý o Ý Ý o o o Ý Ý Ý o o o o Ý Ý o o Ý Ý Ý o o o Ý o Ý o o o o Ý Ý o o o o o Ý o o o Ý Ý o Ý o o Ý o o Ý o o o Ý o Ý o o o o o Ý o o Ý Ý o o o o Ý o o o o o o Ý o o o o o o o o Structurally Redundant YES YES NO NO NO NO NO NO NO YES NO NO NO NO NO NO NO YES NO NO YES YES NO NO NO YES NO NO NO YES NO NO NO NO YES YES NO NO NO NO YES YES YES YES NO NO NO NO NO NO NO YES NO NO YES YES NO YES NO YES YES YES YES YES Rules Violated IER1 IER1 TER1 TER1 IER1 IER1 MER IER, MER IER1 TER1, MER TER1 IER1 IER1 TER2 IER1 IER1 IER1 IER1 TER2, MER IER1, MER IER1 MER IER1, MER IER1 IER2 IER2 IER2, TER1 IER2, TER1 TER1 TER1 TER1 TER1 TER1 TER1 IER2, TER2 IER2 TER2 IER2 IER2 TER2 TER2