VI. Semantics of Data Models

advertisement
BACS485
(485Data.Doc)
DATA CHARACTERISTICS AND MODELS
I. Introduction
Purpose of chapter is to provide background to understand rest of course. It emphasizes
data in an organizational context.
There are a lot of terms and a lot of disjoint abstract concepts. Necessary for all that we
do later.
Lecture Objectives:
 To distinguish between data and information
 To describe the 3 levels of data abstraction: reality, metadata, and data
 To define the various associations between data entities: 1:1, 1:M, M:M, and
conditional
 To introduce the use of graphical notations (bubble charts, E/R) to model data and
associations
 To understand the basics of the ANSI/SPARC 3-level model
 To define and illustrate data independence
 To introduce the semantics of data models
 To introduce the relational, semantic, and object-data models
A main point is that if you can not represent the data unambiguously in logical terms, then
you cannot implement a database that serves the needs of the organization.
Another key point is to realize that there is no one best model for all situations. You have
to match the model to the task (and the level of data abstraction needed).
Let's begin by talking about the "nature of data"...
II. Nature of Data
When you talk about data it is important to distinguish between objects in the real world,
the structure of the database, and the data stored in the database.
Copyrighted to Jay Lightfoot, Ph.D.
1
There are actually 3 levels of abstraction to be considered when you talk about data:
- Reality
- MetaData
- Data itself
Note, the term abstraction will be used a lot. It means to present only the essential factors
or to pull back from and ignore detail.
You use abstractions all the time (but don't realize it). For example, the table of contents
is an abstraction of a book, a model airplane is an abstraction of the real thing, your
resume is an abstraction of your job history.
1st level of data is reality...
A. Reality
This is the level of the real world. At this level you often talk about the enterprise and the
organization for which the database is designed. For example, a bank, a school, a
government branch... .
The most basic building block at this level are entities.
1. Entity
Entities represent a "thing" of interest in the real world.
An entity may be an object with a physical existence (e.g., a particular person, car, house,
employee...) or it may be an object with a conceptual existence (e.g., a company, a job, a
university).
Sort of like a noun (person, place, or thing).
You would not want to create entities about everything. Only those things of interest to
the organization.
Copyrighted to Jay Lightfoot, Ph.D.
2
2. Attribute
Each entity has particular properties called attributes that describe it.
For example, an employee entity may be described by the employee's name, age, address,
salary, ...
Again, you only record the attributes about an entity that the organization has need to
know. For example, employee eye color is seldom needed by organization, so it is not
collected.
An attribute that is composed of several more basic attributes is called composite while
attributes that are not divisible are called simple or atomic attributes.
Composite attributes can form a hierarchy. For example, Address can be further
subdivided into 3 simple attributes--street, city, zip.
If you always refer to a composite attribute as a whole then there is no need to subdivide
it.
a) Value
A particular entity will have a value for each of its attributes.
Note that I said a particular entity. You don't have a single value for a group of entities.
For example, each individual employee has a distinct value for social security number and
job title. Some of the job titles may be duplicated among different employees, but they
still have their own.
Most attributes have a single value for a particular entity and are called single-valued.
For example, a specific person entity has one value for Age, so Age is a single-valued
attribute of person.
In other cases, an attribute can have a range of values and is called multi-valued. For
example, a Major attribute for a student could have two values if dual majors are allowed.
Copyrighted to Jay Lightfoot, Ph.D.
3
b) Identifier
Each specific entity must have at least one attribute (or several in combination) that
uniquely distinguish it from all other similar entities. This is called the identifier.
For example, the social security number attribute uniquely identifies each employee entity
from the other.
An entity identifier is said to functionally determine the other attributes in the entity.
(This is not a mathematical function.)
This is an important point for later.
3. Entity Class
You usually have groups of similar entities in a company. For example, there may be
thousands of employee entities in an organization. These employee entities share the
same attributes, but each has its own value(s) for each attribute.
All of these entities have identical structure, so you can group them into what is called an
entity class or entity type.
This is a concept similar to a data file made up of records (except at a different level of
abstraction... remember we are talking about reality). The records all have identical
structure but distinct values.
Again, consider that an attribute must exist in every entity class that uniquely identifies
distinct entities.
The set of individual entity instances at a particular moment in time is called an extension
of the entity class.
The entity class does not change often, but the extension normally does.
4. Associations
Attributes are properties of individual entities. Another essential type of property is called
associations. This is the relationship between 2 or more entities.
Copyrighted to Jay Lightfoot, Ph.D.
4
The associations can exist between entities in the same entity class and between entities in
different entity classes.
Associations is one of the keys to the power of database over the traditional file based
approach.
In the traditional approach, these associations were hard-coded into the programs, making
them difficult to maintain. Database approach captures this aspect of reality directly.
Note that there are association types and association instances. For example, an
association type says that Departments have multiple Employees. An association instance
says that Accounting entity has 12 Employee instances.
Remember however, we are still talking about reality. We aren't up to anything
concerning the computer yet.
Entities can be within and between entity classes...
a) Between Attributes (within entity class)
Associations can be between attributes within an entity class.
For example, in the Employee entity class, some are managers while the rest are workers.
An association called MANAGES lets you determine which employees work for the
manager (who is also an employee). (recursive)
b) Between Entity Classes (between entity classes)
Associations can be between attributes in different entity classes. This is the equivalent of
associations between files (on another level of data abstraction).
For example, the PRODUCT entity is associated with the CUSTOMER entity by the
association ORDERED-BY. Or, the STUDENT entity is associated with the CLASSES
entity by the association ENROLLED-IN.
This (again) is where database is different from traditional file processing. The database
approach can capture this information while the file based approach cannot.
Copyrighted to Jay Lightfoot, Ph.D.
5
The 2nd level of data...
B. Metadata (Structure)
You can't really directly do much with reality on a computer system. You are normally
limited to capturing the essence of reality. This is the metadata level.
This information is normally stored in the Data Dictionary or the Repository.
You will notice some parallels in metadata and reality. This is intentional. The metadata
is supposed to capture the essential elements of reality (and thus form an abstraction or
model of reality that can be coded into the database).
More specifically:
reality
entity class ----->
attribute
---->
Associations ---->
metadata
record type
data item type
relationships
Be very careful to distinguish metadata from data. This is equivalent to the distinction
between entity instance and entity class.
1st element of metadata is the data item...
1. Data Item
A data item is the smallest named unit of stored data. Also known as data element, field,
or attribute. Field usually has a physical connotation (area on a disk drive). Attribute
usually is associated with the reality level.
A data item is indivisible by definition. In other words, the organization has no need to
view its component parts individually if they exist.
For example, Employee social security number, salary, department number are all data
items at the metadata level.
Data items have certain characteristics normally used to describe their structure to the data
dictionary (or repository).
Copyrighted to Jay Lightfoot, Ph.D.
6
a) Name
Data items must have names (so you can refer to them). Because they can be "quantified",
the names do not need to be unique in the entire database. (They do need to be unique
locally however.)
For example: You can have a data item named ADDRESS in both the EMPLOYEE record
type and the CUSTOMER record type without problems. This would be notated
EMPLOYEE.ADDRESS and CUSTOMER.ADDRESS.
Data item names must be unique within a single record type.
Some implementation models put restrictions on data item names, but conceptually there
are none.
b) Type
The "type" of a data item determines what kind of data can be stored in it and what
operations can be performed on it.
For example; you have
NUMERIC -- numbers
CHARACTER -- text
DATE -- dates
You can add and subtract NUMERIC and DATE, you can also multiple and divide
NUMERIC, but you can only perform addition on data of type CHARACTER (this is
called string concatination).
There are lots of data types (different for each DBMS) and you can sometimes define your
own (called "abstract data types"). For example, data type for suits in playing cards would
allow HEART, CLUB, SPADE, and DIAMOND.
c) Length
Determines the number of characters allowed in the data item. This is the maximum
number allowed (less is usually OK).
Copyrighted to Jay Lightfoot, Ph.D.
7
It should not have anything to do with the way it is actually stored, but it usually does
(i.e., 1 char = 1 byte).
4th characteristic of data items...
1. Source (actual/virtual)
The "source" of data tells where it comes from. "Actual" data items are actually stored
somewhere on disk. "Virtual" data items are not. They are derived when needed.
For example, EMPLOYEE record type might store employee_birthdate and age data item.
You would not really store the AGE data item because it would keep getting out-of-date.
Instead you calculate it when needed using the birthdate and the system date. User is not
aware of this.
This is similar to the different between "logical" and "physical" data items. Physical data
is actually stored in the format presented to the user while logical data is not. Logical can
leave out data items or rearrange.
d) Domain
The "domain" of a data item is the set of allowable values it may take on.
For example, the domain for the GPA data item is real numbers between 0.0 and 4.0
inclusive. Thus, 4.3 is not in the domain, so it is illegal.
There are 2 types of domains:
- Implicit domain - determine from type and range of data item
- Explicit domain - list specific allowable values
Domain is useful for integrity checks and should be built into the metadata of the
Database.
6th (last) characteristic of data items...
e) Value
The "value" is the specific domain value stored in each data item instance.
Copyrighted to Jay Lightfoot, Ph.D.
8
If you don't know the value, DBMS usually use the special value called NULL to hold the
place. This is different from blank or 0 because it means that the value is unknown. This
is the source of many problems in advanced database use.
2nd level of metadata...
2. Data Aggregate
Data items are sometimes grouped together to form data aggregates.
These are named groups of data items. They are used to connect several related data items
together. For example, ADDRESS could be made up of STREET, STATE, and ZIP.
Normally you create data aggregates because you need to be able to see both the
individual data items in some cases and the group of data items in other cases.
You can build arbitrarily complex hierarchies of data aggregates (i.e., ones made up of
other items and aggregates).
You can store the same characteristics for data aggregates that you store for data items
(e.g., NAME, LENGTH, TYPE), or you can let system defaults take over for some of the
characteristics (not all).
3rd level of metadata...
3. Record
A record is a named collection of data items and/or data aggregates. Usually, all data
items of interest about a specific entity class are stored in a record type. The table below
shows how the terms covered so far are related to each other.
Metadata
RECORD --->
RECORD TYPE --->
DATA ITEM --->
Data
RECORD OCCURRENCE ->
FILE
---->
FIELD ---->
Reality
ENTITY
ENTITY CLASS
ATTRIBUTE
For example, for the student entity class you have a STUDENT record type and a data file.
Copyrighted to Jay Lightfoot, Ph.D.
9
A record contains several data Items (one for each attribute of an entity). In the
STUDENT record type you could have data items for ID, NAME, ADDRESS, MAJOR,
GPA, ... .
The same characteristics (e.g., name, length, components...) for record types are stored in
the repository as are stored for data items with a few additions.
1st Additional characteristic of records...
a) Keys (Primary, Secondary)
A key is a data item (or several data items put together) used to identify a record or group
of records.
This is the equivalent concept to identifier in the reality realm.
For example, the "key" of the EMPLOYEE record type would probably be the social
security number because it identifies each specific employee record.
There are several types of keys, we'll look at 2 for now ...
(1) Primary Key
A primary key is one or more data items that uniquely identifies a specific record.
As stated above, the primary key for EMPLOYEE would be social security number. For a
PURCHASE-ORDER it could be the PO-NUM printed on the top of each sheet.
There may be several potential primary keys (called candidate keys) in a record.
There also are cases where the primary key is made up of several data items (called
composite key or concatenated key).
Every record must have a primary key so you can tell specific records apart and that key
cannot usually be NULL.
Copyrighted to Jay Lightfoot, Ph.D. 10
(2) Secondary Key
A secondary key is one or more data items that identifies several records with the same
value for the data item(s).
For example, in the EMPLOYEE record the JOB-TITLE data item could be a secondary
key because it would allow you to quickly identify all employees with the same job title.
Secondary keys do not uniquely identify record instances (because then you would call it
a "candidate key".
2nd piece of additional category for records...
b) Intersection Records
Some record types describe associations between entities instead of describing the entity
itself. These are called intersection records.
Intersection records are used when you want to store data about the relationship or when
the entities are related in a complex way called many-to-many (more on this later).
4th (last) aspect of metadata...
4. Relationship
Relationships are the metadata level version of associations between entities and entity
classes.
NOTE: You can have relationship types and relationship instances. A type connects
entity classes while the instance connects specific entity occurrences. The first is
metadata level while the second is data level.
As stated above, you can capture relationships at the metadata level using intersection
records. Thus, relationships can have attributes just like entity classes.
For example, you have EMPLOYEE entity class and PROJECT entity class associated by
WORKS-ON relationship type. You could have a NUMBER-OF-HOURS attribute in the
relationship type to denote hours for specific instances of employees working on projects.
Copyrighted to Jay Lightfoot, Ph.D. 11
Traditional file based processing is unable to capture this information in the data, instead
you had to write programs to accomplish the association.
3rd level of data...(reality, metadata, data)
C. Data (occurrences)
The 3rd level of data is concerned with the actual data in the database itself. It consists of
data instances or occurrences.
For each entity in the real world there is an occurrence of a corresponding record in the
database. For example, for each student in the university there is an occurrence of a
student record.
So, while there is only one STUDENT record type (which is described in the data
dictionary and corresponds to the student entity class), there may be thousands of student
record occurrences.
Similarly, there are many instances of each of the data item types that correspond to
attributes. So each record instance is made up of a group of data item instances that
correspond to attributes in reality.
1st aspect of data level...
1. Record Occurrence
A record occurrence holds data about a specific entity in reality.
For example, the university has a specific record occurrence for you on file.
2nd aspect of data level...
2. Field
Each record occurrence is build of fields. The fields hold values concerning the attributes
of the entity.
Copyrighted to Jay Lightfoot, Ph.D. 12
For example, your record occurrence has a field with your address, your GPA, your
major... .
3. File
A file is a named collection of all occurrences of a given record type.
For example, the record occurrences for all students in the university make up the
STUDENT file.
A file can be visualized as a 2-dimensional table. This is called a "flat file" and is the
highest level of abstraction possible in the traditional file based approach.
Note that the "flat file" limitation does not allow you to store information about
associations.
Highest level of data level...
4. Database
A database is a named collection of interrelated files. Thus it is able to describe both the
data occurrences and the associations between them.
III. Associations Between Data Items
Now that you can distinguish between the 3 levels of data, you need some background on
associations between data items.
This section introduces you to the different types of associations and ways to graphically
represent them and the associations between them.
A. Types of Associations
Copyrighted to Jay Lightfoot, Ph.D. 13
There are 4 different types of associations:
- none
- 1:1
- 1:N
- M:N
We will ignore "none" and concentrate on the 3 primary associations between data items.
Remember, the purpose is to capture the essence of reality. You want to model (or
abstract) the way things are so you can simulate the organization in the computer.
An association implies that the values for the associated data items are in some way
dependent on each other.
1. One-to-One Association
A one-to-one association means that at a particular moment in time each value of one data
item 'X' is associated with up to 1 value of data item 'Y'. This is typically written 1:1.
For example, there is a one-to-one association between the data item STUDENT-NUM
and STUDENT-NAME. It is also true that a 1:1 association exists for the reverse.
In real life, true 1:1 associations are rare. In our culture, husband-to-wife is a 1:1
association.
One common way to graph this is with the following bubble chart.
manager-name <---------------> department-name
2. One-to-Many Association
A one-to-many association means that it is possible for each value of data item 'X' to be
associated with 0, 1 or more values of data item 'Y'.
For example, one STUDENT-NUM data item can be associated with 1 or more COURSENAME data items. However, each COURSE-NAME data item is associated with exactly
1 value for the STUDENT-NUM data item.
Copyrighted to Jay Lightfoot, Ph.D. 14
STUDENT-NUM <--------------->> COURSE-NAME
When you reverse a one-to-many association it becomes a many-to-one. (Same, except
viewed from the other side.)
The key to this is that the association at the metadata level is between data items while at
the data level it is between specific instances of field values.
These are very common in the reality level. Can you think of examples? Try father-tochild, department-to-employee.
3. Many-to-Many Associations
A many-to-many association mean that a value of 'X' is associated with 0, 1 or more
values of 'Y'. Likewise, each value of 'Y' is associated with 0, 1 or more 'X's.
For example, 1 or more EMPLOYEE-NUM can work on 1 or more PROJECTS, and each
PROJECT can have 1 or more workers.
PROJECT-ID <<---------->> EMPLOYEE-NUM
These are also fairly common in the real world.
They are difficult conceptually because each individual data item can be associated with
many others. Usually an intermediate entity is created to handle the mapping.
(intersection record).
4. Conditional Associations / Existence Dependency
Technically, we have been talking about the cardinality of the association.
Copyrighted to Jay Lightfoot, Ph.D. 15
Cardinality is a restriction on the number allowed to participate in the association. Lines
between bubbles represent mappings and arrow heads represent the cardinality.
Another aspect of cardinality is the concept of conditional associations. In these, you put
a range on the allowable values.
For example, a conditional association from SEAT-NUMBER to STUDENT-NUM could
indicate that a seat would have 0 or 1 student at any moment in time.
seat-no <-------------O--> student-num
You can place conditional association can also be on both sides of the relationship. Also,
it can be combined with 1:1, 1:N, and M:N.
For example, one TEACHER-NAME could have 0, 1, or many CLASSE-NAME and one
CLASS-NAME could have 0 or 1 TEACHER-NAME (until the schedule is firmed up).
teacher-name <-O-----------O-->> class-name
The opposite of a conditional association is a existence dependency. This says that one
instance of the data value cannot exist with another.
The effect is to place a lower bound of 1 on the conditional range of the association.
For example, when the semester gets underway, each CLASS must have at least 1
TEACHER and each TEACHER may teach 0, 1, or many CLASSES.
teacher-name <--|---------------O-->> class-name
There are other aspects of associations that we will cover later.
B. Graphing Data Associations
There are numerous ways to graph data associations between data items. I'll cover the one
that we'll use in this course (it is one of the most popular methods in industry).
Copyrighted to Jay Lightfoot, Ph.D. 16
1. Bubble Chart
We have been using bubble charts for the last few examples. In them, data items are
represented by named bubbles, mapping by lines and cardinality by arrows. We also
introduced some conditional notation with small circles and lines.
Another notation used is to underline the name of the bubble that is the identifier (key) of
the record type.
Bubble charts are useful for grouping data items into records and for deriving more
complex data models.
Note that you can represent record types and record occurrences with bubble charts.
We will use them to represent functional dependencies in the conceptual design lectures
later in the semester.
IV. Associations Between Records
When you group data items into record types then you can represent a higher level of data
associations. These are associations between records.
Again, remember the difference between record type and record occurrence (different
levels of data). Associations between record types is at the metadata level while
associations between record occurrences is at the data level.
You graph these associations in several ways. Initially I will use what are called data
structure diagrams. These are blocks connected by lines with "crows feet". The lines are
labeled with the name of the association. Other methods exist and we will cover them
later.
Copyrighted to Jay Lightfoot, Ph.D. 17
STUDENT #
ADDRESS
MAJOR
GPA
CLASSIFICATION
TAKES
COURSE #
ROOM
CREDIT
TIME
INSTRUCTOR
A. Types of Associations
As before, there a 4 types of associations. The first is none and we generally ignore it
because we are only interested in situations of interest to the organization
1. One-to-One
Two record types can be associated by a one-to-one relationship. The associations
between record types means the same thing as between data items. That means the
relationship goes both ways.
The difference, or course, is the level of abstraction of the data (you are working with a
bigger chunk with records).
Note that the diagrams make it look like data items are still connected. This is not correct.
The whole record is associated with the other, not just the place where the lines are drawn.
You can also have conditional relationships with record types.
2. One-to-Many
A one-to-many relationship says that one occurrence of a record type is related to 0, 1 or
more instances of another record type.
If you reverse it, it becomes a many-to-one association.
3. Many-to-Many
Copyrighted to Jay Lightfoot, Ph.D. 18
A many-to-many relationship connects 0, 1 or more instances of one record type to one or
more of another. The reverse of a M:N is denoted a N:M.
With data structure diagrams you can represent a M:N relationship directly. This is handy
and allows you to capture the true essence of the real organization.
Depending upon what entities you are modeling, a M:N can be represented as two 1:N
relationships. For example, the following say the same thing:
invoice <<--------->> product
OR
invoice <------->> line_item <<--------> product
4. Loop/Cycle (recursive)
A cycle is a path that begins at an occurrence of a given record type and proceeds through
a set of related occurrences of different types and eventually leads back to the original
starting type (though not necessarily the same occurrence).
A loop (also called a recursive relationship) is a 1:N relationship among occurrences of
the same type.
5. Required or Optional Existence
Record types can also have conditional association. This is the situation where you limit
the range of the 1:1, 1:N, and M:N.
For example, assume that a company is set-up so that each manager has 0 or 1 secretary
and each secretary is either unassigned or assigned to 1 manager (no multi-manager
secretaries).
secretary <--O---------------O--> manager
Copyrighted to Jay Lightfoot, Ph.D. 19
As stated before, 1:1 are fairly rare.
An example of a conditional 1:N association is a hospital where each patient can have 0,
1, or many tests, but each test must be associated with exactly 1 patient.
patient <--|----------------O-->> test
An example of a conditional M:N association is a hospital where each patient can have 1
or many physicians and each physician can have 0, 1, or many patients.
patient <<--O------------|-->> physician
B. Graphing Record Associations
Graphing the structure of an organization is very important. The information in verbal
form may be correct, but not of much use to database designers (because of ambiguities
inherent in language).
Once you know the basics of graphing entity types and relationships you can build
organizational models of arbitrary complexity that unambiguous "semantically rich".
1. Data Structure Diagram (Bachman)
You have already been introduced to DSDs. They are also called Bachman charts after
Charles Bachman (the man who first used them).
True Bachman charts use single and double arrows without labels. There is no way to
show condition associations. Because of these limitations, DSDs have been modified to
the form we learned.
These diagrams have the advantage of being simple to understand and easy to draw;
however, they don't give a whole lot of information. Because of this, other diagramming
techniques have been developed.
Copyrighted to Jay Lightfoot, Ph.D. 20
2. Entity-Relationship (E/R)
The Entity-Relationship (E/R) diagram puts more emphasis on the relationship between
entities than does the DSD method.
Some textbooks jump right into the complex form of the E/R diagram where you indicate
conditional relationships for every line. I prefer to ease you into it, so I won't put the extra
cardinality symbols for now.
The box symbol still represents record types and the line indicates mapping. The diamond
symbol is new. It stands for the relationship itself and must be named.
For example, for husband---marriage---wife the 1:1 relationship using E/R diagram would
be:
1
1
husband ---------- marriage ------------- wife
Note how there are no arrows or crows feet, thus it is a 1:1. If you wanted to include
conditional information you would do the following:
husband --|------------|-- marriage --|--------------|-- wife
A 1:N E/R example would be:
1
N
department ---------- employees -------------- employees
OR
department ---------- employees -------------< employees
A M:N E/R example would be:
M
N
Copyrighted to Jay Lightfoot, Ph.D. 21
projects ----------- have --------------- tasks
OR
projects >---------- have ---------------< tasks
Note how the name of the relation implies the perspective of the relationship. The
direction implied is part of the semantics of the organization you are modeling.
M
N
products ------------ contain ------------ parts
M
N
parts----------------- make up ------------ products
M
N
teacher --------------- teaches ------------ students
Bubbles represent attributes off of entities and relationships in the E/R model. An
underlined bubble represents a primary key.
3. Relational Notation (not really a diagramming technique)
This is not really a diagramming technique, but it is often used to denote entities,
attributes, and the primary keys. It can also be used to imply the relationships (though not
directly).
I call this relational notation but different books call it different things. It is really just a
DSD without the boxes and the lines. For example:
PRODUCT (PRODUCT-NUM, DESCRIPTION, PRICE, QTY-ON-HAND)
VENDOR (VENDOR-NUM, VENDOR-NAME, VENDOR-CITY)
SUPPLIES (VENDOR-NUM, PRODUCT-NUM, VENDOR-PRICE)
Note that SUPPLIES has the primary keys of both entities, so you could "look up" the
price if you know both VENDOR-NUM and PRODUCT-NUM.
Copyrighted to Jay Lightfoot, Ph.D. 22
Technically, SUPPLIES is an intersection record between an M:N relationship thereby
creating 2 1:N relationships.
V. ANSI 3-Level Model
Now you are able to represent the semantics (meaning) of a real-world organization via
the DSD and E/R diagramming method.
You can generate a model of the organization. Models are useful because they represent
the essential basics of a system without all the clutter of detail.
For example, a working model of a house can help you locate the furniture. Likewise, a
scale model of a city in an earthquake can help you see if buildings will stand up.
Models are useful abstractions of reality.
They have always used models in the field of database. However, originally they tried to
code the model directly into a big program. TOO COMPLEX!
When reality changed, the model was out-of-date and it was too hard to update it. (Exact
same problems as traditional file based system.)
About 1968 the ANSI/X3/SPARC committee decided that a way to avoid these problems
was to have 3 levels of model. In this way you could change the low level physical details
and the implementation details without having to bother the users.
There are 3 levels to the ANSI/SPARC model.
1. Conceptual
You haven't known it, but I have been diagramming the middle layer of this model in the
discussion so far. This layer is known as the conceptual model or of the organization.
The E/R diagramming technique is normally used to capture the structure (metadata) of
the enterprise (notice how I tied the level of data to the level of the model -- not an
accident).
Copyrighted to Jay Lightfoot, Ph.D. 23
There is 1 conceptual model for an organization. All the semantics of interest to that
organization are captured in the model.
At the conceptual level, the DBA is concerned with entities, attributes, and relationships.
The conceptual view is totally independent of the hardware used and the data that specific
users want to see.
A. External (view, sub-schema)
The External model (or view, or subschema) is concerned with the way the user views the
organization.
This is a subset of the conceptual model because users do not need (or want) to see the
whole database. Thus, there are many external views (contrast that to a single conceptual
model.)
Each user is able to define the entities, attributes, and relationships they require, but they
cannot touch other areas of the database. Sort of "free security".
These views are also independent to the hardware and the DBMS used.
B. Internal (Implementation)
The internal level (also called the implementation model or the schema) defines the whole
database in a technological dependent style. In other words, this level is concerned with
the specific computer and the physical details of the DBMS.
This level is needed because the conceptual level is hardware independent.
There are 3 basic implementation models
- hierarchical
- network
- relational
1. Physical (not really model)
Below the internal level you have the physical reality of the computer.
Copyrighted to Jay Lightfoot, Ph.D. 24
This includes the low level details of access methods and pointers and disk mirroring, etc.
The purpose of the ANSI/SPARC 3-level model is so you don't have to worry about this
stuff above the internal level.
C. Data Independence
The ANSI 3-level model helps "insulate" the upper levels from low level detail. In that
way you can change the computer without affecting users. Also, you should be able to
change the DBMS without affecting the users.
This is called data independence. Officially it can be defined as the capacity to change the
model at one level of a database system without having to change the model at the next
higher level.
There are 2 types of data independence...
1. Physical Data Independence
When physical independence exists, changes can be made to the physical characteristics
of the data (like moving to a new disk or changing an indexed field) without affecting
existing programs at the external level.
2. Logical Data Independence
When logical independence is present, fields can be added or deleted (or their names and
specs changed) without having to recompile existing programs.
In other words, the logical link between how the data is defined and how it is used is
flexible (independent).
Each level of independence is really a mapping between the two models. When the lower
model changes, all you have to do is update the mapping. The data dictionary (repository)
holds these mappings.
VI. Semantics of Data Models
Now that you are familiar with data modeling and the ANSI 3-level approach, we can go a
little deeper into the semantics of data models.
Copyrighted to Jay Lightfoot, Ph.D. 25
Remember that semantics are the meaning of the data. A good model captures the basic
meaning without getting too much detail.
Actually a good model is a balance between too much and not enough semantic detail.
The more detail you capture, the more realistic the model is (but there is a limit where too
much information is counter-productive.)
What follows are typical data semantics of captured by data models.
A. Cardinality (1:1,1:N,M:N)
We discussed cardinality before when we talked about 1:1, 1:N, and M:N relationships
between entity classes.
To review, the cardinality describes the nature of the relationship between entities. It can
be 1:1, 1:N, M:N.
B. Degree
The degree of a relationship describes the number of entities that can participate in the
relationship.
1. Unary, Binary, Ternary
Up to now we have seen strictly binary and uniary (loop/cycle) relationships, but
relationships of higher degree are possible.
Ternary relationships happen occasionally. You cannot break a ternary relationship into 3
binary relationships. Instead you have to call the relationship an entity involved in 3 1:N
relationships.
Higher degree relationships are possible, but stay away from them if possible because it
gets too complex.
C. Existence Dependency (Referential Integrity)
Existence dependency is the opposite of conditional association.
Copyrighted to Jay Lightfoot, Ph.D. 26
In other words it says that one entity cannot exist without the existence of an instance of
some other entity.
For example, a CUSTOMER order cannot exist with an associated CUSTOMER. Or, a
STUDENT-GRADE cannot exist with a related STUDENT.
Existence dependency is also called referential integrity. This is important in database
systems because you want to make sure the relationships between entities are "in sync".
For example, if you delete a CUSTOMER, you must also delete all ORDERS for that
customer or they will be hanging out there forever.
D. Time
One of the more complex aspects of database theory is time. We won't get into this deeply
in this course.
In general, commercial DBMSs handle the time semantic poorly. They assume that you
are only interested in the current situation, not in any cumulative history.
Some of the difficulty occurs because not all of the database is modified at once. Only
one or two data items may change. So you wind up storing thing called difference files
and having to reconstruct the database for the desired time-frame.
The problem is that the difference file chain soon becomes long and bulky (tough to
process).
An additional complication is that you can add and remove metadata entities over time.
This means that some data items are not valid except during a certain time-frame to you
also have to store time related metadata.
Then there is the problem with derived data... it goes on and on.
E. Uniqueness
The primary concern with uniqueness is related to primary keys.
At least one data item must uniquely identify each record in a record type.
Copyrighted to Jay Lightfoot, Ph.D. 27
Another aspect of uniqueness requires that specific instances of a record be associated
with exactly 1 instance of another record.
For example, M:N relationship between STUDENTS and GRADE with relationship
SECTION. Each STUDENT gets exactly 1 GRADE.
A 3rd type of uniqueness is called exclusivity. Exclusivity is like an either-or filter
between entity types. Either one relationship between the entities or another may exist,
but they both may not.
For example, the book mentions how 2 uniary relationships can exist between
EMPLOYEES, the MANAGES and the MARRIED relationships. Either can exist
between 2 occurrences, but both may not (i.e., you can be married to your boss).
F. Class/Subclass (Generalization and Aggregation)
Sometimes an entity is made up of several classes of similar "sub-entities" that should
really be handled differently occasionally.
For example, the conceptual model of an organization may have an EMPLOYEE entity.
This entity could have been broken down into HOURLY-EMPLOYEES and SALARYEMPLOYEES for a specific purpose (e.g., payroll).
However, that division would cause problems with the rest of the model (you would have
to have 2 relationships instead of one for everything that connected to the old
EMPLOYEE).
To solve this problem you have a class/subclass structure. One entity is the "parent" and
the other are the "children". The children could be thought of as specializations of the
parent and the parent can be thought of as a generalization (abstraction) of the children.
In this way you get the best of all worlds. You can have the EMPLOYEE entity and the
subclass HOURLY and SALARY entities in the same model.
Usually the models that allow class/subclass relationships also have inheritance.
Inheritance allows the children to implicitly keep the structure of the parent without
duplicating it on the lower level. Any specific differences between parent and child are
Copyrighted to Jay Lightfoot, Ph.D. 28
stored in the children (for example, HOURLY would have HOURLY-RATE and
SALARY would have MONTHLY-WAGE).
In diagramming this, usually an oval with ISA is drawn. It says that the child ISA parent
(e.g., HOURLY-EMPLOYEE ISA EMPLOYEE)
In the diagram, aggregation is down and generalization is up.
Next, we move on to looking at the actual data models. You have already seen DSD and
E/R models as conceptual data models. Next, I will present 3 more conceptual models:
- Relational model
- Semantic data model
- Object-Oriented model
But first, a glance at the extended E/R model
VII. Extended E/R Data Model
The original E/R model was just boxes, diamonds, and arrows. It did not capture the
semantics of conditionality, existence dependency, class-subclass, participation... .
Since then, it has been extended several times to make it a more powerful tool.
MultiValued Attribute - Double circle represents an attribute that can have more than one
value (e.g., STUDENT-MAJOR).
Conditional Association - Small circles show when the cardinality is 0, 1, or more for a
specific relationship. Sometimes this is denoted with a dotted line.
Existence Dependency - Short line shows when cardinality range starts at 1 (i.e.,
mandatory existence).
Exclusive Relationship - An arc that says that either one or the other (but not both)
relationship types are used. (i.e., service performed by nurse or doctor, but not both).
Exclusive Occurrence - Convex lens says that a specific occurrence may participate one or
the other relationship, but not both (i.e., MANAGE or MARRIAGE, but not both).
Copyrighted to Jay Lightfoot, Ph.D. 29
Class/Subclass - Shaded boxes denotes generalization (up) and specialization (down).
Sometimes an ISA oval is used instead.
We won't use this much in class; however, I do want you to realize that it exist and be
familiar with the symbols given in the book. A few years after you go out and get jobs
this probably will be used frequently.
VIII. Relational Model
The relational model was first proposed by Codd in 1970. It is based on a branch of
mathematics called set theory (thus the name "relational" because you have relations
between the sets).
The relational model is a definitional one--that is, it is intended to define the metadata to
the DBMS. For humans to use it, however, it does have a common graphical
representation.
A relation can be viewed as a 2-dimensional table that is similar to what you normally
would call a flat-file.
The relational model is currently the most important DBMS model. It is both a high-level
conceptual model (sort of) and an internal level implementation model.
The relational model is popular for several reasons:
 it is fairly intuitive so non-technical users can understand it
 it is designed around sets and set operations so you can design technically "good"
structures easier (you have the tools built in)
 it maps easily from the E/R model to the implementation version of a relational
DBMS
 it has been implemented on PC platforms for a reasonable price - thus widely
available
 it has mathematical basis, so you can design one on paper to work before starting to
code
 is flexible (allows ad hoc queries via SQL and QBE) and it can be efficient on large
batch jobs
Relations have the following properties:
Copyrighted to Jay Lightfoot, Ph.D. 30
 each column contains values about the same attribute and each cell must have a
single value
 each column has a distinct name and the order of the columns is immaterial
 each row is distinct (no duplicates)
 the order of rows is immaterial
A. Table
A table is the physical representation of a relation. Tables hold information about entity
classes for the database.
A relation is represented as a 2-dimensional table in which the rows correspond to
individual entity occurrences (records) and the columns correspond to attributes (fields).
B. Tuple
Each row of a table corresponds to an individual record instance. These rows are called
tuples.
C. Attributes
Each column of the table contains values of a single attribute. Like a field on the data
level.
For example, the STUDENT-ID column of the table contains the values for STUDENTIDs of all tuples.
D. Degree
The degree of the relation is a count of the number of attributes.
This has nothing to do with the semantic term degree which means the number of entities
participating in the relationship.
E. Cardinality
The cardinality of a relation is a count of the number of tuples. Again, no relation to the
semantic term cardinality (1:1, 1:n, M:N)
Copyrighted to Jay Lightfoot, Ph.D. 31
F. Domain
A domain is the set of possible values for an attribute. For example, the domain of GPAs
is the real numbers between 0 and 4 inclusive.
The domain for CITY names is the alphabet (restricted to names of valid cities).
G. Key (primary, foreign, cross-reference)
Each relation must have at least one attribute (or combination) that uniquely identifies
each tuple from the others.
This attribute is called the primary key. If it is a combination of attributes then it is called
a concatenated key.
If more than one attribute can do this then it is called a candidate key or alternate key.
If one non-key attribute in a relation is the primary key in another relation it is called a
cross-reference key or a foreign key. These are how the relational model connects distinct
relations.
IX. Semantic Data Model
The Semantic Data Model (SDM) is not really intended for diagrams (although it does
have some nice graphical methods). It is a way to define the semantics of an organization
(a model).
The SDM is intended to capture all meaning of an organizations data and to describe it to
the DBMS as a set of metadata structure and integrity constraints -- pretty ambitions.
It contains all the features described for the extended E/R data model and more.
(Extended E/R got its ideas from SDM).
The SDM is not a single standard model. There are many active research projects
currently going on. It is all very interesting and very new.
It is most significant because it has generated interest in methods to store the meaning of
data and it donated some key concepts to the object-oriented data model.
Copyrighted to Jay Lightfoot, Ph.D. 32
X. Object-Oriented Data Model
The object-oriented data model combines procedure (called methods) and data into an
inseparable package called an object.
Essentially, an object knows how to process its own data. In that way you must send it a
message saying "print yourself" and the internal procedure does what is necessary.
The structure of objects is defined so that you have a network of subclasses and
superclasses. Subclass objects inherent attributes and methods from their superclass
objects.
An object can be composed of any kind of data; for example, text, procedures, pictures,
other objects... .
A major goal of the OODM is to model the organizational behavior, not just its structure.
The object-oriented model was developed for dynamic situations where the structure
changes often and the data required to perform data requests would normally be widely
scattered in traditional DBMS metadata.
Also, OODB model systems where there are few object instances relative to the number of
object classes (different from business processing).
For example, CAD/CAM, electronic circuit design/simulation
A. Object
In OODB, each object represents a physical entity, a concept, an idea, an event, or some
aspect of interest to the database application.
Each object instance is a self-contained mixture of data and procedure.
B. Object Class
The structure of the OODB is captured in the network of object classes. This is similar to
the schema in more traditional DBMSs.
Copyrighted to Jay Lightfoot, Ph.D. 33
Subclasses can inherent the methods and attribute values of their superclasses.
C. Object Instance
Equivalent to a record instance. A specific instance of an object class.
D. Message
Objects communicate and perform all operations via messages. A message consists of an
object (or several objects) followed by a method to be applied to these objects.
An alternate approach is to send the name of the message and any needed attribute values
to the object instance.
E. Method
A method is similar to a procedure. Object methods are stored in the object class
hierarchy and are available to all object instances.
If several methods have the same name but perform different actions it is called operator
overload or polymorphism.
F. Actor (Demon)
Objects that wait for messages before they do something are called passive objects.
Ones that perform operations based on other stimuli are called active objects or demons.
Active objects are started without specific messages and can be used to perform
background work and "watch dog" type functions.
Copyrighted to Jay Lightfoot, Ph.D. 34
Download