An Analysis of Cardinality Constraints in Redundant Relationships

advertisement
An Analysis of Cardinality Constraints in Redundant Relationships
James Dullea and Il-Yeol Song
College of Information Science and Technology
Drexel University
Philadelphia, Pennsylvania 19104
Email: james.dullea@phl.boeing.com and songiy@post.drexel.edu
Abstract
In this paper, we present a complete analysis of redundant
relationships in the entity-relationship model. Existing
approaches use the concept of functional dependencies for
identifying redundancy but ignore minimum cardinality
constraints that carry important information about the structure
of the model. Research literature on the topic is rare and
usually is confined to the ‘Many to One’-‘mandatory
participation’ case. Our approach differs from previous works in
that we consider both maximum and minimum cardinality
constraints to analyze the 4096 cases required to perform a
complete study. Our approach first looks at the maximum
cardinality constraints to develop a set of general rules to
identify groups of trivial and ambiguous structures, and then we
give greater consideration to the minimum cardinality
constraints in those groups that require detailed investigation.
With this approach we have provided a thorough pattern
analysis of redundant relationships from both a structural and
semantic view. The scope of this paper focuses on a complete
and thorough analysis of a binary relationship redundant with
respect to the composite of two binary relationships and
establishes inferences that can extend this analysis to the more
complex ‘n-relationship path’ case. We provide a complete set
of heuristics for identifying redundant relationships that can be
easily applied by data modelers and system analysis people.
1. Introduction
Entity-relationship (ER) modeling [CHEN76] is the
foundation of various analysis and design methodologies for the
development of relational databases, object oriented databases,
and object modeling. A key measure of success in the design of
these models is that they afford the accurate storage of
information without unnecessary redundancy.
Redundancy exists in two forms, redundant data and
redundant relationships. Data redundancy has received the
majority of attention in both the everyday development of
commercial databases and the research literature. The driving
force in the development of design techniques and methodologies
from the very beginnings of entity-relationship modeling and
normalization were the reduction of data redundancy. Of less
fame is the concept of a redundant relationship. Teorey, et al,
[TEOR86] offers a short definition by stating that relationships
that represent the same concept are considered to be redundant.
They indicated that redundant relationships should be
eliminated because they are “likely to result in unnormalized
relations when transforming the model in relational schemas”.
They use an example of a transitive dependency to make their
argument. Figure 1 is a reproduction from the [TEOR86] article.
STUDENT
ATTENDS
BELONGS_TO
LOCATED_IN
CLUB
SCHOOL
FIGURE 1: Transitive relationships (Figure 5 in the [TEOR86])
In this example the minimum cardinality (Participation) is
mandatory for each relationship and the maximum cardinality is
‘Many To One’ between STUDENT and CLUB, between
STUDENT and SCHOOL, and between CLUB and SCHOOL.
[TEOR86] states that the relationship ATTENDS is redundant
because there exists a transitive dependency between
BELONGS_TO and LOCATED_IN. Briefly stated ‘if the CLUB
determines the SCHOOL and the STUDENT determines the
CLUB, then using this path the STUDENT can determine the
SCHOOL. If this is true then it seems that the relationship
ATTENDS is really unnecessary and therefore redundant.
Segev, a year later, in a corrigenda to [TEOR86], challenged the
redundancy of the ATTENDS relationship stating that there
needs to be an additional semantic constraint that is required
which confines the student to belonging to a CLUB that is
associated only with the school they attend [SEGE87]. Both
Teorey and Segev agree that identifying redundant relationships
must be done very cautiously and strong consideration must be
given to both structural consistency and semantic relevance.
For two relationships to be redundant they must be
structurally consistent, semantically related, and unambiguous.
We believe that approaching the problem using occurrence
diagrams supported by matrix algebraic techniques in our
detailed analysis to verify our results will yield a set of useful
heuristics for the complete analysis of redundant relationships.
The inclusion of minimum and maximum cardinality constraints
coupled with the concept of a semantic connection constraint
viewed from both their relatedness and completeness is key to
our analysis. It enriches the semantic information in the model
necessary to make inferences about the redundancy of a
relationship beyond a simple suspicion of redundancy.
This paper is organized as follows. Section 2 introduces the
notation and definitions used. Section 3 discusses previous
research and their limitations, and indicates the approaches used
to analyze the data. Section 4 explains the results of our analysis
from both maximum and minimum cardinality constraints, and
an informal development of the heuristics. Section 5 formalizes
the rules for redundant relationships. Section 6 concludes and
summarizes the results of the paper while presenting future
research considerations.
2. Notation and Definitions
2.1
Notation
For uniformity in presentation we must briefly define some
terminologies that are key to the environment supporting
redundancy. A data model can be represented by a diagram of
entities interconnected by relationships. This connectivity
represents the mapping of the associated entities’ instances in the
relationship [TEOR94]. Teorey uses rectangles to represent
entities connected by straight lines to diamonds representing
relationships. The maximum cardinality constraint is indicated
by the fill of the diamond (see Figure 1). Minimum cardinality
is represented by an open circle placed on the line between the
entity and the relationship to indicate optional or partial
participation. The absence of an open circle indicates mandatory
or total participation. Figure 1 is an example of an EntityRelationship (ER) diagram constructed in Teorey notation while
Figure 2 is a similar diagram constructed using the Chen
approach. Chen’s approach allows us to explicitly show both
mandatory and optional cardinality constraints. Chen’s notation
indicates maximum cardinality by placing an ‘M’or ‘1’for Many
or One near the entity rectangle in the diagram.
M
STUDENT
M
ATTENDS
RSC
BELONGS_TO
1
RSS
1
LOCATED_IN
RCS
CLUB
M
SCHOOL
1
FIGURE 2: An Example of a Suspiciously Redundant Relationship
(ATTENDS is Redundant)
For minimum cardinality, Chen’s notation specifically
indicates optional or mandatory participation by using a
whitened or blackened circle, respectively, on the line between
the entity and the relationship. There are slight differences
between Entity-Relationship diagramming techniques. For a
complete discussion see [SONG95] for a comprehensive
comparison of various Entity-Relationship diagramming
techniques. We introduced the problem using Teorey notation
because it was the original diagram in [TEOR86], but we will
switch to Chen notation because it relates better to Appendix
Table 2. Throughout this paper we will use Cmax as an
abbreviation for maximum cardinality and Cmin for minimum
cardinality.
2.2
Paths and Composite Relationships
An ER diagram is made up of a path of alternating entities
and relationships. We can define a subpath as a series of
alternating entities and relationships starting with an initiating
entity and ending with a terminating entity. If a path has the
capability of starting with and ending with the same entity then it
is called a cyclic path. Figure 2 is an example of a cyclic path.
In a data model the use of Cmin and Cmax in a binary
relationship represents a specific semantic meaning between two
participating entities. There exists the case where the coupling
of more than one binary relationships taken as a whole represents
a specific semantic relationship greater than what is
communicated by each single binary relationship. We call this a
composite relationship and it is an implied relationship between
the two outermost entities that exist in the composite of two or
more contiguous relationships that convey an additional meaning
when taken together. In Figure 2, the composite of relationships
STUDENT/CLUB and CLUB/SCHOOL can be taken together as
the composite relationship STUDENT/CLUB/SCHOOL and
infers additional information about Student and School. Figure
12 shows the composite of many relationships within a subpath,
we define this as an ‘n-relationship path’.
2.3
Semantic Connection Constraint
In order for this inference to take place there must exist an
additional semantic constraint imposed by the model with respect
to the composite relationship. We call this constraint a semantic
connection constraint and define it as a restriction on the
mapping of data instances in the intermediate entities that forces
them to carry transitive semantic information between the two
outermost entities. In Figure 2, the assumption that the club of
which the student belongs must be a club associated with the
student’s
school
in
the
composite
relationship
STUDENT/CLUB/SCHOOL would be a semantic connection
constraint and it could communicate additional information
about the student and the school beyond just STUDENT/CLUB
and CLUB/SCHOOL information. If the semantic connection
constraint was missing we could make no inference about the
student’s school through the club. In order for a semantic
connection constraint to have a transitive property the component
relationships must be semantically related, unambiguous, and
complete.
2.4
Semantically Related, Unambiguous, Completeness
We define a semantic connection constraint to be
semantically related when the constraint establishes an
association between two contiguous relationships with the entity
between them acting as a surrogate and carrying sufficient
linking information that establishes a connection between the
outermost entities. In our example of the composite relationship
STUDENT/CLUB/SCHOOL,
the
constraint
on
STUDENT/CLUB stating that each student must belong to one
and only one club coupled with the constraint on
CLUB/SCHOOL that a club is associated with one and only one
school is not sufficient to establish a related semantic connection
constraint. With only this constraining information, a student
could possibly be paired with a club outside the school they
attend.
This lack of semantically related constraining
information was the basis of [SEGE87] corrigenda on [TEOR86].
Only when an additional semantic constraint is imposed on the
composite relationship that associates the STUDENT/CLUB
relationship with the CLUB/SCHOOL relationship is the
semantic connection related. Jones and Song addressed a similar
issue in [JONE96] concerning the relatedness of binary
relationships within a ternary relationship. They show there can
exist multiple binary relationships, some of which are related to
and supply additional semantic information about the ternary
relationship.
The concept of relatedness alone does not allow us to
analyze the redundant relationship sufficiently. The model must
exhibit no ambiguity in its connectivity across the relational path.
For example, if two schools each have a chess club, it would be
unclear as to how to associate a student to a unique school
knowing only that the student belongs to a chess club.
The semantic connection constraint must also be complete.
A relationship is complete when all the data instances being
modeled by the connecting constraint are passed between
intermediate entities sufficiently to conceptually represent the
single composite relationship between the two outermost entities.
If in our previous example, only some of the students were
required to belong to a club, then we could not infer any
association between all students and the school they attend. The
composite relationship would be related but incomplete. The
participation across all relationships does not have to be
mandatory for the semantic connection constraint to complete; it
only has to be sufficient. In Figure 2, only some of the clubs are
associated with the students, but the semantic connection
constraint is still complete because it sufficiently allows all of
the student instances to be associated with their schools
2.5
Redundant Relationship
We now have laid sufficient foundations to develop a
definition of redundant relationships and identify the conditions
necessary for the redundancy to exist. As mention earlier
[TEOR86] states that two relationships that represent the same
concept are considered to be redundant. We amplify the word
“relationships” to include both binary relationships and
composite relationships. In a binary relationship the connectivity
between the two entities is related, unambiguous, and complete
by the very nature of the connecting relationship. In a composite
relationship the overall transitive connectivity between the two
outermost entities must be established through a semantic
connection constraint that is related, unambiguous, and
complete. For two relationships to represent the same concept,
the mappings of the data instances between the outermost
entities must be identical.
A single binary relationship occurring in a cyclic path is
defined as a redundant relationship if there exists a composite
relationship that completes the cyclic path and represents the
same concept through a semantic connection constraint that is
semantically related, structurally unambiguous, and sufficiently
complete. Figure 2 is an example of a cyclic path containing a
redundant relationship if the constraint that ‘all Students were
require to belong to a Club that is associated only with the
student’s School’ is imposed upon the composite relationship.
3. Approach
3.1
Existing Approaches
Research on the thorough analysis of redundant
relationships is infrequently found in the literature. The
available research depends heavily on semantic information, the
functional dependency of data items and Armstrong’s transitive
rule to determine redundancy [AZAR86] [ORLO90] [WU92].
[AZAR86] presents an algorithm using functional dependencies,
join dependency components, and inclusion dependencies to
identify data and relationship redundancy. The algorithm uses
renaming procedures transforming local properties into universal
attributes while introducing inclusion dependencies to derive
redundant items. [ORLO90] uses a natural language interpreter
to collect candidates for ‘elementary fact types’ of semantic
constraints.
Elementary fact types are related to data
dependencies in order to develop functional dependencies. They
use a concept called a ‘derived fact type’ that is derived from the
elementary fact types and corresponding functional dependencies
that are redundant. [WU92] uses a vector approach to analyze
redundancy. Relationships are expressed as vectors whose
components are values of ones and zeroes depending on the
functional dependencies of the candidate key in the connecting
entities. The product resultant of the vectors is compared against
the suspicious redundant relationship vector to determine
possible redundancy.
3.2
Limitations of Existing Approaches
The above three approaches depend heavily on the use of
functional dependencies to identify redundant relationships.
There are two reasons why we believe that further exploration in
this area is appropriate. First functional dependencies do not
take into consideration minimum cardinality constraints.
Although these constraints are still available in the semantic
information, they become difficult to discern where multiple
relationships exist in the relationship path. Second, there is no
evidence that the use of functional dependency methods can be
applied to the analysis of composite relationships. We also feel
that these previous methods and the current literature do not
address the more complicated redundancy issues, such as the
Many To Many scenarios, and they are not readily transferable to
other modeling techniques, such as object-oriented modeling.
We believe that the redundant relationship paradigm has not
been fully explored, and that both minimum cardinality and
maximum cardinality play an important role in identifying
redundancies. In order to explain our concept we will introduce
a generic example as shown in Figure 3, similar to the [TEOR86]
example, that will be drawn upon throughout this paper.
A
RAB
B
RAC
RBC
C
FIGURE 3: A generic example (cardinality constraints not shown)
The center of our analysis will be the examination of the possible
redundancy of relationship RAC with respect to composite
relationship RABC. We will assume for our analysis that
relationship RAB is semantically related to relationship RBC
through entity B. This means that for each occurrence of Entity
A associated with an occurrence in Entity B there exists an
association with an occurrence in Entity C through Entity B that
carries connectivity information from the occurrence in Entity A.
It has been previously stated that without RAB and RBC being
semantically related the question of redundancy between
relationships RAC and RABC is enervated and thus further
analysis is not required. With that assumption in mind our focus
will be on the ambiguity and completeness of both relationships
(RAC and RABC) with respect to maximum and minimum
cardinality constraints.
4. Analysis
In our simple ABC example there are 4096 different
combinations based on both Cmax and Cmin. There are four
variations of Cmax (1:1, M:1, 1:M, and M:N) by three
relationships yielding 43 (or 64) possible combinations.
Appendix Table 1 shows the 64 Cmax combinations for the
generic example in Figure 3. We will refer to each entry in
Appendix Table 1 as Cmax Group 1 through 64. Cmin has two
possible variations (mandatory participation or optional
participation) by two per each of the three relationships giving
forth 26 (or 64) possible combinations. Cmax and Cmin taken
together yields 64 times 64 equaling 4096 combinations. Our
first objective was to identify trivial patterns at the Cmax level to
reduce the number of combinations to a manageable task. Howe
identifies a connection trap with respect to maximum cardinality
constraints that he calls ‘the fan trap’[HOWE89].
4.1
The Fan Rule (FAN)
The fan trap exists when a composite relationship contains
an intermediate entity that contains two opposing ‘M:1’
cardinality constraints. The two relationships fan out (M:1-1:M)
with respect to the maximum cardinality constraint from the
intermediate entity. An example of a fan relationship is shown
in Figure 4 where the diagram represents ‘many employees can
belong to one department and a department has the responsibility
for many projects’.
RED
EMPLOYEE
M
RDP
DEPARTMENT
1
1
PROJECT
M
FIGURE 4: An example of a FAN relationship
A quick look at this composite relationship might lead one to
infer a relationship can be developed between an employee and a
project but applying this structure no inference can be made.
Although there is a connectivity between them, an occurrence of
an employee with a department does not uniquely identify a
project occurrence because department does not relate employee
to a project. The present of a fan relationship in a path renders
the path ambiguous with respect to inferring connectivity
between outermost entities. We therefore can make the statement
that if fan relationship exists in either path then no inference can
be made about redundancy of the relationships.
A ‘Many to Many’ relationship between two entities can be
decomposed into two ‘Many to 1’ relationships [HOWE89]. The
mechanics are quite simple and many references are available
[BRUC92] [HOWE89] [SHEP90]. Briefly the relationship
between the two original entities is converted into a surrogate
entity with a ‘MANY’ cardinality on each side. The original
entities are connect with the surrogate entity with ‘ONE’
cardinality constraints.
Since the decomposition of the M:N relationship yields 1:M
and M:1 relationships, any connectivity with a 1:M relationship
on the ‘ONE’ side will result in a fan relationship. The fan rule
states that if a fan relationship or ‘a M:N relationship coupled
with a M:1 relationship on the ONE side’ exists in either path
then no inference can be made about the redundancy of the
relationships. Applying the fan rule coupled with M:N
decomposition, the following Cmax Groups 10, 12, 14, 16, 26,
28, 30, 32, 42, 44, 46, 48, 58, 60, 62, and 64 from Appendix
Table 1, can be eliminated from redundancy consideration
because of ambiguity.
4.2
The Many to Many Rule (MMR)
Taking a closer look at ‘Many to Many’ relationships we
came to the conclusion that any paths containing a M:N
relationship is ambiguous with three trivial exceptions. First, if
relationship RAC is M:N and relationship RABC represents the
decomposition of RAC (RAB being 1:M and RBC being M:1)
then relationship RAC obviously would be redundant to RABC.
Second, if relationship RAC is M:N and relationship RAB is 1:1
with relationship RBC being M:N and identical to RAC, then
relationship RAC is obviously redundant to relationship RABC.
Third, is similar to the second special case with relationship
RAB being M:N and relationship RBC being 1:1. In these three
special cases the Cmin must be mandatory participation for all
relationships otherwise they would fall into the ambiguous
category.
We concluded that a path containing a ‘Many to Many’
relationship is ambiguous with respect to our redundancy
analysis because there is no functional dependency constraint
between the two entities [JONE96] [TEOR86]. Without some
functional dependency the question of redundancy could only be
resolved at the data instance level. This would means that the
state of redundancy could change on a per update basis and
would be dependent on which courses the students picked. Our
objective is identify a stable set of consistent heuristics
concerning redundancy irrespective of the data instances. We
therefore can define the Many-to-Many Rule stating that if a
‘Many to Many’ relationship exists in either path then no
inference can be made about redundancy of the relationships.
Applying the ‘Many to Many’ rule Cmax Groups 4, 13, 14*, 15,
16*, 20, 28, 29, 30*, 31, 32*, 36, 39, 40, 45, 46*, 47, 48*, 49,
50, 51, 52, 53, 54, 55, 56, 57, 58*, 59, 60*, 61, 62*, 63, and 64*
from Appendix Table 1, can be eliminated from redundancy
consideration because of ambiguity ( * indicates also eliminated
by the fan rule).
4.3
The Directional Rule (DIR)
4.3.1
Directional Constraints (DIR1-Expanding and
Contracting)
We define a directional constraint as both relational paths
must be expanding or contracting in the same direction with
respect to Cmax. For example, in Figure 3, if relationship RAC
has the cardinality of M:1, then we could consider that the
composite relationship RABC is a suspiciously redundant
relationship if the Cmax for RABC is ‘M:1-M:1’, ‘M:1-1:1’ or
‘1:1-M:1’. Both relationships would be contracting. On the
other hand, if relationship RABC had a cardinality of ‘1:M-1:M’,
‘1:M-1:1’, or ‘1:1-1:M’ it would be expanding and redundancy
would be impossible. Since the maximum cardinality constraint
only addresses occurrences that participate in the relationships
we can make directional constraint inferences with respect to the
initiating entity or the terminating entity. At this point in our
analysis we can not make inferences about intermediate entities
because the minimum cardinality constraint will play an
important role in the connectivity between the outermost entities.
We will address minimum cardinality later in this section. For
now we will confine our rule to on initiating and terminating
entities. We define the directional rule (DIR1) as two paths to
be suspiciously redundant the Cmax on the initiating and
terminating entities of each path must be either expanding or
contracting in the same direction, or they must at least be
constant (1:1) for a redundant relationship to be possible.
4.3.2
Directional Constraints (DIR2-Constant)
In our definition of directional constraint we left out the
possibility of a constant Cmax. We did this on purpose as not to
confuse expanding and contracting concepts with what we
consider to be a special case. When combined with a M:1
relationship in a relationship path, a constant relationship has a
neutral effect if the participation is mandatory or a more
constraining effect if the participation is optional. We will
address these issues in the second half of this section. What we
need to address is the situation of relationship RAC being ‘One
to One’. Again we can only make inference about the initiating
and terminating entities. We can state a corollary to the above
directional rule that applies when one path has a 1:1 cardinality
constraint (or a series of 1:1 cardinality constraints). The
directional rule (DIR2) states that if two paths are to be
suspiciously redundant with one path having 1:1 cardinality then
the Cmax in the other path must connect with the initiating and
terminating entities on their ‘ONE’ sides. If the ‘Many’ side was
connected to the initiating or terminating entity then we would
have either an expanding or contracting Cmax and violate the
previous directional rule (DIR1). It is acceptable to have ‘Many’
constraint on an intermediate entity if and only if the Cmin is
optional. Applying the Directional Rule, Cmax Groups 2, 6, 7, 8,
9, 10*, 11, 12*, 17, 18, 21, 22, 23, 24, 33, 35, 32*, 36, 41, 42*,
43, and 44* can be eliminated from redundancy consideration
because of ambiguity ( * indicates eliminated by the fan rule).
4.4
Summary of Cmax Analysis
Applying the Fan, Many-to-Many, and Directional Rules,
we were able to identify that the following Cmax Groups were
ambiguous and thus we are able to eliminate them from
redundancy consideration. They are 2, 4, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 20, 21, 22, 23, 24, 26, 28, 29, 30, 31, 32,
33, 35, 32, 36, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, and 64 from
Appendix Table 1. Also, in our consideration, groups 5, 34, 37,
and 38 are the mirror images of groups 3, 25, 19, and 27,
respectively. These four groups can be eliminated from our
Cmin analysis because they will yield the same results and be
governed by the same rules as their mirror images. This leaves
us with five groups to be explored. They are groups 1, 3, 19, 25,
and 27. We believed that group 27 (shown in Figure 5) where
the relationship RAC is M:1, AB is M:1, and RBC is M:1 was
the most important and the other four groups were either special
cases or subsets of group 27.
A
(Entity A) was partially participating in one subpath, it
necessitated that in the other subpath either Cmin for the
1
B
M
RBC
1
C
FIGURE 5: ABC example for group 27 (maximum cardinality constraints shown)
4.5
Cmin Analysis of Group 27 (M:1-M:1-M:1 Cases)
As stated before there are 64 different Cmin possibilities.
Appendix Table 2 shows the 64 cases to be analyzed in this
section and Figure 5 shows our ABC example for group 27 with
the cardinality constraints. The results of the analysis is
presented in Appendix Table 2, each Cmin case is numbered
from 1 to 64 to allow referencing.
For each of the Cmin 64 cases we developed an occurrence
diagram [ELMA94] to assist us in examining the structural
constraints on the relationships. In analyzing the patterns, we
found some phenomena of interest that required further research.
A driving factor in the results was the concept of an ‘Initiating
Entity’ and a ‘Terminating Entity’. The term initiating and
terminating are arbitrary according to how the diagram is viewed
but relative only to each other for analysis purposes.
In our ABC example of Cmax group 27, we found 23 of the
64 cases to be redundant. Examining both the group of 23
redundant cases and the 41 non-redundant cases, we found three
structural patterns that allowed us to determine relationship
redundancy status for Cmax group 27. First, we found a rule that
applied to the ‘Initiating Entity’. The Initiating Entity (IER1)
states when the Cmin of the ‘Initiating Entity’ (Entity A) had
mandatory participation in one subpath it required the Cmin to
be mandatory in the other subpath. Figure 6 shows a set of
relationships that meet the Initiating Entity Rule (IER1) while
Figure 7 shows a set of relationships that violate the rule. In
Figure 7c the relationships are not transitive even though they
are related because they are not complete with respect to student
A6 which does not participate in RAB. As a corollary to the
IER1, we also recognized that when the ‘Initiating Entity’
RBC
B
M
1
RAC
M
M
C1
C2
T
RBC
RAB
M
A1
A2
A3
A4
A5
A6
1
T
T
FIGURE 7b: An Occurrence Diagram
Showing All Tuples (A1 thru A6) of the
Student Entity Participates in the
Relationship RAC
RBC
B1
B2
B3
T
C2
T
M
T
1
C1
T
1
RAC
M
A1
A2
A3
A4
A5
A6
1
C2
RAB
C
1
FIGURE 7a: An Example of A Relationship
Structure That Violates the Initiating Entity
Rule
C1
T
RBC
B
M
FIGURE 6b: An occurrence diagram
showing all tuples (A1 thru A6) of the
Student Entity participates in the
Relationship RAC
A1
A2
A3
A4
A5
A6
1
1
C
FIGURE 6a: An example a
Relationship Structure That Meets All
Three Rules (RAC is Redundant)
A1
A2
A3
A4
A5
A6
RAC
RAB
1
RAC
1
RAC
RAB
M
1
M
A
M
M
RAB
M
A
M
1
1
M
B1
B2
B3
C2
T
T
P
C1
T
FIGURE 7c: An Occurrence Diagram showing a
FIGURE 6c: An Occurrence Diagram
related but Incomplete transitive relationships
showing a related and complete transitive
between enties A/B AND B/C from Figure 7a
relationship between entities A/B and B/C
one tuple in entity A (A6) does not participate in
from Figure 6a
the Relationship RAB
initiating entity needed to be partial or the Cmin for Entity B (an
intermediate entity) on the ‘MANY’ side needed to be partial.
We called this IER2. Figure 8 shows a set of relationships that
meet the Initiating Entity Rule (IER2) while Figure 9 shows a
set of relationships that violate the rule. In Figure 9b the
relationship RAC is unrelated to the composite relationship RAB
and RBC shown in Figure 9c. They are unrelated because Figure
9b indicates that tuple A6 is not associated with entity C while
Figure 9c forces a relationship through its association with entity
B. The relationship RAC in this case is not redundant.
A
M
A
M
M
M
RAC
RAB
RBC
B
M
1
C
FIGURE 8a: An Example of a
Relationship Structure That Meets All
Three Rules (RAC is Redundant)
A1 M
A2
A3
A4
A5
A6
P
RAC
RAB
1
1
1
1
RBC
B
M
RAC
1
C1
C2
T
FIGURE 8b: An Occurrence Diagram
Showing Only Tuples (A1 thru A5) of
Entity A Participates in the
Relationship RAC
1
C
FIGURE 9a: An Example of a
Relationship Structure That Violates
the Initiating Entity Rule
A1 M
A2
A3
A4
A5
A6
P
RAC
1
C1
C2
T
FIGURE 9b: An Occurrence
Diagram Showing Only Tuples (A1
thru A5) of Entity A Participates in
the Relationship RAC
We also tested this rule for the generalized ‘n-relationship path’
using occurrence matrices and comparing the resultant matrix of
each path, and found it to be true for mandatory participation. It
was also true for the partial participation cases and required at
least the initiating entity to be partial or at least one of the N-2
intermediate entities to be partial.
M
A1
A2
A3
A4
A5
A6
RBC
RAB
1
M
B1
B2
B3
B4
T
A1
A2
A3
A4
A5
A6
C1
C2
P
T
M
1
T
FIGURE 8c: An Occurrence
Diagram showing a related and
complete transitive relationship
between A/B and B/C from Figure
8a. Complete in that A's tuples
(A1 thru A5) are mapped to their
respective C's tuples (C1 and C2)
with respect to Figure 9b.
RBC
RAB
1
B1
B2
B3
B4
M
C1
B
FIGURE 9c: An Occurrence Diagram showing a
unrelated transitive relationships between A/B
and B/C with respect to Figure 9b. A6 Is
forced to participate in the relationship R
AB
with Entity B and because of the total
participation constraint of Entity B with Entity C,
Entity A is forced to be associated with Entity
C. The two subpaths have unrelated semantic
meanings and are not redundant.
M
M
RAC
RAB
1
1
RBC
B
C
1
M
FIGURE 10a An example of a relationship
structure that violates the Terminating Entity Rule
RAC
A1 M
A2
A3
A4
A5
A6
T
M
1
C1
C2
C3
P
FIGURE 10b: An Occurrence
Diagram showing only tuples (C1
and C2) of the Entity C participating
in the relationship RAC
A1
A2
A3
A4
A5
A6
RBC
RAB
1
M
1
B1
B2
B3
B4
T
1
1
T
Second, we also found a similar rule for the ‘Terminating
Entity’. We of course called it the Terminating Entity Rule
(TER1). It states when the Cmin of the ‘Terminating Entity’ has
mandatory participation in one subpath it requires the Cmin to be
mandatory in the other subpath. Again we recognized that when
the ‘Terminating Entity’ was partially participating in one
subpath, it necessitated that in the other subpath either Cmin for
the terminating entity needed to be partial or the Cmin for the
intermediate entity on the ‘ONE’ side needed to be partial
(TER2). Figure 10 shows a set of relationships that violate the
Terminating Entity Rule.
We again tested this rule for the generalized ‘n-relationship
path’ and found it to be true for mandatory participation, and true
for the partial participation group requiring at least the
terminating entity to be partial or at least one of the N-2
intermediate entities to be partial.
A
T
C1
C2
C3
T
RAC
RAB
C2
T
P
T
M
A
1
M
T
FIGURE 10c: An Occurrence Diagram
showing that at least one of the tuples must
participate with Entity C's C3 because of the
Total Participation Constraint on RAB and
RBC. This is not consistent with Figure 10b.
Of the 41 combinations that failed to be redundant using
both the occurrence matrix method and drawing the occurrence
diagrams, only two did not meet either the “Initiating Entity
Rule” or the “Terminating Entity Rule”. They were case 9 and 29
(see Appendix 2). Further analysis identified that if the Cmin of
one of the intermediate entities was mandatory on the ‘ONE’
side and partial on the ‘MANY’ side then the Cmin of the
‘Initiating Entity’ of the opposing path must be partial. This
concept remained consistent with the other 39 cases. We call the
application of this concept the Intermediate Entity Rule (MER).
Figure 11 shows a set of relationships that violate the
Intermediate Entity Rule.
It also was expandable to an ‘n-relationship path’ diagram
with one additional requirement. In the ‘n-relationship-path’
model all the intermediate entities needed to be mandatory on
the ‘ONE’ side.
RAB
M
C
1
FIGURE 11a An example of a relationship
structure that violates the Intermediate Entity Rule
A1 M
A2
A3
A4
A5
A6
T
RAC
M
1
A1
A2
A3
A4
A5
A6
C1
C2
T
RBC
RAB
1
1
M
B1
B2
B3
B4
T
C1
C2
P
T
T
FIGURE 11c: An Occurrence Diagram showing
that the transitivity of A6 in not complete
because of the Participation Constraint onRBC.
FIGURE 11b: An Occurrence
Diagram showing the relationship
RAC in Figure 11a.
4.6
Cmin Analysis of Cmax Groups 1, 3, 19, and 25
Cmax groups 19 and 25 contains at least one ‘Many to One’
cardinality constraint in both of the relationship paths. We
found that the rules developed from our analysis of Cmax group
27 were consistent with Cmax groups 19 and 25 without any
modification even with the introduction of the ‘One to One’
constraint. In Cmax groups 1 and 3 we found the case to be
different. When one or both paths are comprised of a ‘One to
One’ cardinality constraint (or a series of ‘One to One’
constraints) it introduces additional restrictions on the
connectivity. We found that the rules for Cmax group 27 were
still consistent for groups 1 and 3 with the following additional
constraints. For Cmax group 1, the structures presented in cases
2 and 36 could not yield a redundant relationship. For Cmax
group 3, the structures presented in cases 1, 21, 35, 41, 43, 55,
61, and 63 also could not yield a redundant relationship. This
required the additional development of a rule when a ‘One to
One’ cardinality constraint exists in at least one path. We call
this the One-to-One Rule (11R1). With respect to Cmax group 1
we found that if one path is mandatory on the terminating entity
side then in the other path the intermediate entity’s cardinality
constraint could not be optional on the initiating entity’s side and
mandatory on the terminating entity’s side. With respect to
Cmax group 3 we found that in the path containing the M:1
cardinality that the initiating entity’s ‘One to One’ cardinality
constraint on the intermediate entity side must be optional for a
redundant relationship to exist (11R2).
5. Rules for Analyzing Redundant Relationships
The above analysis led to the development of a set of
heuristics for two paths contained within a cyclic path (starting
with E1 and ending with EN ,with E2....N-1 between them on only
one of the paths. In order for two paths (Path X and Path Y) to
be redundant to each other they must meet all of the following
cardinality constraint rules:
E1
M
M
R12
P
A
T
H
Y
1
PATH X
E2
R1N
M
R23
1
1
EN-1
M
R(N-1)N
1
EN
FIGURE 12: An Example of a Redundant 'N-Relationship Path'
Rules Dealing with Maximum Cardinality
The Fan Rule (FAN)
FAN If a fan relationship or ‘a M:N relationship coupled with
a M:1 relationship on the ONE side’ exists in either path
then no inference can be made about the redundancy of
one path to the other.
The Many-to-Many Rule (MMR)
MMR If a non-trivial ‘Many to Many’ relationship exists in
either path then no inference can be made about
redundancy of one path to the other.
The Directional Rule (DIR)
DIR1 The maximum cardinality constraints on the initiating and
terminating entities of each path must be in the same
direction, or they must at least be constant (1:1) for a
redundant relationship to be possible.
DIR2 If two paths are suspiciously redundant with one path
having 1:1 cardinality then the Cmax in the other path
must connect with the initiating and terminating entities
on their ‘ONE’ sides.
Rules Dealing with Minimum Cardinality
Initiating Entity Rule (IER)
IER1 If the Cmin associated with E1 for Path X is 1 (mandatory
participation) then the Cmin associated with E1 for Path
Y must also be 1.
IER2 If the Cmin associated with E1 for Path X is 0 (partial
participation) then in Path Y either the Cmin of E1 must
be 0 or the Cmin of at least one E2...N-1 on the “MANY”
side must be 0.
Terminating Entity Rule (TER)
TER1 If the Cmin associated with EN for Path X is 1 (mandatory
participation) then the Cmin associated with EN for Path
Y must also be 1.
TER2 If the Cmin associated with EN for Path X is 0 (partial
participation) then in Path Y either the Cmin of EN must
be 0 or the Cmin of E2...N-1 on the “ONE” side must be 0.
Intermediate Entity Rule (MER)
MER If the Cmin associated with all E2...N-1 is 1 (mandatory
participation) on the “ONE” side and at least one of the
Cmin associated with the any E2...N-1 is 0 (partial
participation) on the “MANY” side then the Cmin of E1
must be 0 (partial participation) in Path X.
One-to-One Rule (11R)
11R1 If both paths are a series of ‘One to One’ maximum
cardinality constraints and one path is mandatory on the
terminating entity side then in the other path any
intermediate entity’s cardinality constraint can not be
optional on the initiating entity’s side and mandatory on
the terminating entity’s side.
11R2 If only one path is a series of ‘One to One’ maximum
cardinality constraints then in the path containing the
‘Many to One’ cardinality the initiating entity’s ‘One to
One’ cardinality constraint on the intermediate entity side
must be partial.
6.
Conclusion
Entity-relationship diagramming has been the engineering
foundation methodology of data modeling for over twenty years.
During that period of time published research literature on the
topic of redundant relationships are very rare, not to mention the
absence of complete cardinality analysis and a set of heuristic
rules to guide analysts in real world database modeling and
design. We have performed a complete analysis of all cardinality
constraints and developed eleven heuristic rules for deciding the
redundancy of two relationship paths. The advantages of our
method are that the heuristic rules can be easily and visually
applied to all entity relationship diagrams for deciding
redundancy and they are complete in that they address all
possible combinations of minimum and maximum cardinality
constraints. Our approach of using occurrence diagrams is easily
understood, repeatable, and independent of any data instances.
We feel that previous methods relying only on functional
dependency analysis were cumbersome and confusing to apply to
the data model and ignored the minimum cardinality constraint
allowing only for a conclusion that was suspiciously redundant.
The set of heuristic rules provided from our analysis of redundant
relationships is complete and consistent for all cases, and readily
applicable by data modelers and system analysts.
We believe that this analysis is a major step forward in the
analysis of redundant relationships and provides an adequate
foundation for further work with the entity-relationship model in
the area of composite relationships, the inclusion of ternary
relationships in the model, analysis of composite paths, and the
existence of multiple sets of redundant relationship in a single
cyclic path. Also, the analysis can be readily applied to objectoriented modeling where most of the research emphasis is being
focused. Analysis of redundancy between classes with many
different modeling constructs in the object model are new areas
yet to be explored.
References
[AZAR86] Azar, N. and E. Pichat,1986 “Translation of an extended
entity-relationship model into the universal relation with inclusions
formalism”, Entity-Relationship Approach: Ten Years of Experience
in Information Modeling. Proceedings of the Fifth International
Conference, pp.253-60, Nov. 17-19, 1986.
[BRUC92] Bruce, Thomas A., 1992. Designing Quality Databases
with IDEF1X Information Models, Dorset House Publishing, NY
[CHEN76] Chen, Peter, 1976. “The Entity-Relationship Model -Toward a Unified View of Data”, ACM Transactions on Database
Systems, 1(1)9-36, March 1976.
[ELMA94] Elmasri, Ramez and Shamkant B. Navathe, 1994.
Fundamentals
of
Database
Systems,
2nd
Ed.,
The
Benjamin/Cummings Publishing Co, Inc., Redwood City, CA.
[HOWE89] Howe, D. R., 1989. Data Analysis for Data Base Design,
2nd Ed., Edward Arnold, London, GB.
[JONE96] Jones, Trevor H., and Il-Yeol Song, 1996. “Analysis of
Binary/Ternary Cardinality Combinations in Entity-Relationship
Modeling”, Data & Knowledge Engineering, 19(1996)39-64.
[ORLO90] Orlowska, M.E.and Zhang Yanchun, 1990.
“On
enhancements of semantic methodologies for relational database
design”, Databases in the 1990s. Proceedings of the Australian
Database Research Conference, pp. 97-108, Feb. 6, 1990.
[SEGE87] Segev, Arie, 1987.
“Transitive Dependencies”: in
Surveyors’Forum, Computing Surveys, 19(2)191-193.
[SHEP90] Shepherd John C., 1990. Database Management, Theory
and Application, Richard D. Irwin, Inc., Boston, MA.
[SONG95] Song, Il-Yeol, Mary Evans, and E. K. Park, 1995. “A
Comparative Analysis of Entity-Relationship Diagrams”, Journal of
Computer & Software Engineering, 3(4)427-459.
[TEOR86] Teorey, Toby J., Dongqing Yang, and James P. Fry, 1986.
“A Logical Design Methodology for Relational Databases Using the
Extended Entity-Relationship Model”, Computing Surveys, 18(2)197222, June, 1986.
[TEOR94] Teorey, Toby J., 1994. Database Modeling and Design
- The Entity-Relationship Approach , Morgan Kaufmann Publishers,
Inc., San Mateo, CA.
[WU92] Wu, J.Y.J., 1992. “A data modeling approach with E-R
table for the representation of enterprise system”, International
Journal on Information and Management Sciences , 3(1)79-100,
June,1992.
Appendix Table 1 : Maximum Cardinality Constraints for the
ABC model and the rules applied to show ambiguity.
Appendix Table 2 : Minimum Cardinality Constraints for the
ABC model and the rules applied to show structural
redundancy for Group 27 (M:1, M:1, M:1)
Cmax
Group
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
RAC
1-1
1-1
1-1
1-1
1-1
1-1
1-1
1-1
1-1
1-1
1-1
1-1
1-1
1-1
1-1
1-1
M-1
M-1
M-1
M-1
M-1
M-1
M-1
M-1
M-1
M-1
M-1
M-1
M-1
M-1
M-1
M-1
1-M
1-M
1-M
1-M
1-M
1-M
1-M
1-M
1-M
1-M
1-M
1-M
1-M
1-M
1-M
1-M
M-N
M-N
M-N
M-N
M-N
M-N
M-N
M-N
M-N
M-N
M-N
M-N
M-N
M-N
M-N
M-N
RAB
1-1
1-1
1-1
1-1
1-M
1-M
1-M
1-M
M-1
M-1
M-1
M-1
M-N
M-N
M-N
M-N
1-1
1-1
1-1
1-1
1-M
1-M
1-M
1-M
M-1
M-1
M-1
M-1
M-N
M-N
M-N
M-N
1-1
1-1
1-1
1-1
1-M
1-M
1-M
1-M
M-1
M-1
M-1
M-1
M-N
M-N
M-N
M-N
1-1
1-1
1-1
1-1
1-M
1-M
1-M
1-M
M-1
M-1
M-1
M-1
M-N
M-N
M-N
M-N
RBC
1-1
1-M
M-1
M-N
1-1
1-M
M-1
M-N
1-1
1-M
M-1
M-N
1-1
1-M
M-1
M-N
1-1
1-M
M-1
M-N
1-1
1-M
M-1
M-N
1-1
1-M
M-1
M-N
1-1
1-M
M-1
M-N
1-1
1-M
M-1
M-N
1-1
1-M
M-1
M-N
1-1
1-M
M-1
M-N
1-1
1-M
M-1
M-N
1-1
1-M
M-1
M-N
1-1
1-M
M-1
M-N
1-1
1-M
M-1
M-N
1-1
1-M
M-1
M-N
Rules applied to show ambiguity
Unambiguous
Directional Rule
Unambiguous
Many-to-Many Rule
Unambiguous
Directional Rule
Directional Rule
Directional Rule
Directional Rule
Fan Rule, Directional Rule
Directional Rule
Fan Rule, Directional Rule
Many-to-Many Rule
Fan Rule, Many-to-Many Rule
Many-to-Many Rule
Fan Rule, Many-to-Many Rule
Directional Rule
Directional Rule
Unambiguous
Many-to-Many Rule
Directional Rule
Directional Rule
Directional Rule
Directional Rule
Unambiguous
Fan Rule
Unambiguous
Fan Rule
Many-to-Many Rule
Fan Rule, Many-to-Many Rule
Many-to-Many Rule
Fan Rule, Many-to-Many Rule
Directional Rule
Unambiguous, Mirror Image of 25
Directional Rule
Many-to-Many Rule
Unambiguous, Mirror Image of 19
Unambiguous, Mirror Image of 27
Many-to-Many Rule
Many-to-Many Rule
Directional Rule
Fan Rule, Directional Rule
Directional Rule
Fan Rule, Directional Rule
Many-to-Many Rule
Fan Rule, Many-to-Many Rule
Many-to-Many Rule
Fan Rule, Many-to-Many Rule
Many-to-Many Rule
Many-to-Many Rule
Many-to-Many Rule
Many-to-Many Rule
Many-to-Many Rule
Many-to-Many Rule
Many-to-Many Rule
Many-to-Many Rule
Many-to-Many Rule
Fan Rule, Many-to-Many Rule
Many-to-Many Rule
Fan Rule, Many-to-Many Rule
Many-to-Many Rule
Fan Rule, Many-to-Many Rule
Many-to-Many Rule
Fan Rule, Many-to-Many Rule
Cmin
Case
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
RAC
M - 1
RAB
M - 1
RBC
M - 1
Ý
Ý
Ý
Ý
Ý
Ý
Ý
Ý
Ý
o
Ý
Ý
Ý
Ý
o
Ý
Ý
Ý
Ý
Ý
o
o
Ý
Ý
Ý
Ý
Ý
Ý
Ý
o
Ý
Ý
Ý
o
Ý
o
Ý
Ý
o
Ý
Ý
o
Ý
Ý
o
o
Ý
o
Ý
Ý
Ý
Ý
o
Ý
Ý
Ý
Ý
o
o
Ý
Ý
Ý
o
Ý
o
Ý
Ý
Ý
o
o
o
Ý
Ý
Ý
Ý
Ý
o
o
Ý
Ý
Ý
o
o
o
Ý
Ý
o
Ý
o
o
Ý
Ý
o
o
o
o
Ý
o
Ý
Ý
Ý
Ý
Ý
o
Ý
o
Ý
Ý
Ý
o
o
Ý
Ý
Ý
Ý
o
o
o
Ý
Ý
Ý
o
Ý
Ý
Ý
o
Ý
o
Ý
o
Ý
o
Ý
o
o
Ý
Ý
o
Ý
o
o
o
Ý
o
Ý
o
Ý
Ý
o
Ý
Ý
o
Ý
o
o
Ý
Ý
o
o
Ý
o
Ý
Ý
o
o
o
o
Ý
Ý
o
Ý
Ý
o
o
Ý
o
Ý
o
o
o
Ý
o
o
Ý
o
o
Ý
o
o
o
o
o
o
Ý
Ý
Ý
Ý
Ý
o
Ý
Ý
o
Ý
Ý
o
Ý
o
Ý
Ý
Ý
o
Ý
o
o
Ý
Ý
o
Ý
Ý
Ý
Ý
o
o
Ý
Ý
o
Ý
o
o
Ý
o
Ý
Ý
o
o
Ý
o
o
Ý
o
o
Ý
Ý
Ý
o
Ý
o
Ý
Ý
o
o
Ý
o
Ý
o
Ý
o
Ý
o
Ý
o
o
o
Ý
o
Ý
Ý
Ý
o
o
o
Ý
Ý
o
o
o
o
Ý
o
Ý
o
o
o
Ý
o
o
o
o
o
o
Ý
Ý
Ý
Ý
o
o
Ý
o
Ý
Ý
o
o
o
Ý
Ý
Ý
o
o
o
o
Ý
Ý
o
o
Ý
Ý
Ý
o
o
o
Ý
o
Ý
o
o
o
o
Ý
Ý
o
o
o
o
o
Ý
o
o
o
Ý
Ý
o
Ý
o
o
Ý
o
o
Ý
o
o
o
Ý
o
Ý
o
o
o
o
o
Ý
o
o
Ý
Ý
o
o
o
o
Ý
o
o
o
o
o
o
Ý
o
o
o
o
o
o
o
o
Structurally
Redundant
YES
YES
NO
NO
NO
NO
NO
NO
NO
YES
NO
NO
NO
NO
NO
NO
NO
YES
NO
NO
YES
YES
NO
NO
NO
YES
NO
NO
NO
YES
NO
NO
NO
NO
YES
YES
NO
NO
NO
NO
YES
YES
YES
YES
NO
NO
NO
NO
NO
NO
NO
YES
NO
NO
YES
YES
NO
YES
NO
YES
YES
YES
YES
YES
Rules Violated
IER1
IER1
TER1
TER1
IER1
IER1
MER
IER, MER
IER1
TER1, MER
TER1
IER1
IER1
TER2
IER1
IER1
IER1
IER1
TER2, MER
IER1, MER
IER1
MER
IER1, MER
IER1
IER2
IER2
IER2, TER1
IER2, TER1
TER1
TER1
TER1
TER1
TER1
TER1
IER2, TER2
IER2
TER2
IER2
IER2
TER2
TER2
Download