Data Models 1 Introduction 2 Object-Based Logical

advertisement
Data Models
Avi Silberschatz
Henry F. Korth
S. Sudarshan
1 Introduction
Underlying the structure of a database is a data model. A data model is a collection of conceptual
tools for describing the real-world entities to be modeled in the database and the relationships
among these entities. Data models dier in the primitives available for describing data and in
the amount of semantic detail that can be expressed. The various data models that have been
proposed fall into three dierent groups: object-based logical models, record-based logical models,
and physical data models. Physical data models are used to describe data at the lowest level.
Physical data models capture aspects of database system implementation that are not covered in
this article. Thus, our focus here is on the object-based and record-based logical models.
Recently, a new model, the object-relational model, has been developed. It merges the objectoriented data model with the dominant record-based model, the relational model. We discuss this
model briey at the end of this article.
Further details on data models appear in database texts, including Silberschatz, et al. 1996],
and Ullman 1988].
2 Object-Based Logical Models
The object-based models use the concepts of entities or objects and relationships among them
rather than the implementation-based concepts, such as records, used in the record-based models.
Object-based logical models provide exible structuring capabilities and allow data constraints to
be specied explicitly. Below, we present descriptions of the two most widely-used representatives
of these models: the Entity-Relationship model, and the Object-Oriented model.
2.1
The Entity-Relationship Model
The Entity-Relationship (E-R) data model is one of several semantic data models that is, it attempts to represent the meaning of the data.
The E-R model employs three basic concepts: entity sets, relationship sets, and attributes. An
entity is an \object" in the real world that is distinguishable from all other objects. An entity set
is a set of entities of the same type that share the same properties (or attributes). Attributes are
descriptive properties possessed by all members of an entity set. Each entity has its own value for
each attribute. A set of attributes that suces to distinguish all entities in an entity set is called
a primary key. A relationship is an association among several entities.
Extended E-R features include specialization, generalization, higher- and lower-level entity sets,
attribute inheritance, and aggregation. An explanation of these features is beyond the scope of
this article. Further discussion of the E-R model appears in Chen 1976], which introduced the E-R
model.
1
2.2
Object-Oriented Model
The object-oriented data model is an adaptation of the object-oriented programming language
paradigm to database systems. The model is based on the concept of encapsulating data, and code
that operates on that data, in an object. Entities, in the sense of the E-R model, are represented
as objects with attribute values represented by instance variables within the object. The value
stored in an instance variable is itself an object. Thus, a containment relationship, the is-part-of
relationship, is established among objects. An advantage of the containment concept is the ability
for objects to be shared among several containing objects.
An object may send a message to another object, causing that object to execute a method
in response. Methods are procedures, written in a general purpose programming language which
manipulate the object's local instance variables and may send messages to other objects. This
encapsulation of code and data has proven useful in developing modular systems. Objects that
contain the same types of values and the same methods are grouped together into classes. A class
may be viewed as a type denition for objects. Classes are organized into an inheritance hierarchy
each class inherits attributes and methods from classes that are above it in the hierarchy. This
combination of data and code into a type denition is similar to the programming language concept
of abstract data types. This hierarchical structure facilitates code sharing among classes. Taking full
advantage of both the code- and object-sharing features is an important aspect of object-oriented
data modeling.
Object-oriented data models for databases extend the above-mentioned data modeling features
of the object-oriented paradigm. The extensions include data integrity constraints, persistence
of data (which allows transient data to be distinguished from persistent data) and support for
collections. There are two approaches to creating an object-oriented database language:
1. Extending existing database languages with concepts from the object-oriented paradigm.
2. Extending existing object-oriented programming languages to deal with databases by adding
concepts such as persistence and collections.
For further discussion of the object-oriented model see Kim 1990].
3 Record-Based Logical Models
Record-based models are so named because the database is structured in xed-format records of
several types. Each record type denes a xed number of elds, or attributes, and each eld is usually of a xed length. The use of xed-length records simplies the physical-level implementation of
the database. The relational model has established itself as the primary data model for commercial
data processing applications. The rst database systems were based on either the network model
or the hierarchical model, both of which are tied more closely to the underlying implementation of
the database, and are now decreasing in importance and real-world use.
3.1 The Relational Model
The power of the relational data model lies in its rigorous mathematical foundations and a simple
user-level paradigm. Mathematically speaking, a relation is a subset of the cartesian product of an
ordered list of domains. For example, let be the set of all employee identication numbers,
the set of all department names, and the set of all salaries. An employment relation is a set of
E
D
S
2
3-tuples (
) where 2 , 2 , and 2 . A tuple (
) represents the fact that employee
works in department and earns salary .
At the user-level, we represent a relation as a table. This table has one column for each domain
and one row for each tuple. Each column has a name, which serves as a column header, and is
called an attribute of the relation. The set of attributes for a relation is called the relation schema.
The process of designing a relational database involves the selection of a set of relation schemas.
An initial set of schemas can be generated from an E-R database design by using a relation to
represent each entity set and relationship set. There are often many possible choices that the
database designer might make. To illustrate these choices, consider a database of employees,
departments, and managers. Assume that a department has only one manager. If we use a single
schema (employee, department, manager), then we must repeat the manager of a department once
for each employee.
We can avoid this redundancy by using two schemas (employee, manager) and (manager, department). However, if a particular manager manages two departments, we cannot represent a
situation where an employee works in only one of these two departments. If instead, we choose the
two schemas (employee, department) and (manager, department), we would avoid this diculty,
and, at the same time, avoid redundancy. The theory of normalization helps in the choice of relation
schemas.
There are several languages for expressing operations on relational databases. In all these
languages, the expressions and/or operations are over relations, and their results are also relations,
This allows queries to be constructed modularly from sub-queries, and allows for automated query
optimization. The relational calculus is a nonprocedural language, based on mathematical logic,
that denes the basic power required in a relational query language. The relational algebra is a
procedural language that is equivalent in power to the relational calculus, and denes the basic
operations used within relational query languages.
Commercial database systems use languages with more \syntactic sugar." The three most inuential commercial languages are SQL, QBE, and Quel. Of these three, SQL has clearly established
itself as the standard relational database language, represented by the SQL-92 standard. Further
versions of the SQL standard are under development.
Further discussion of the relational model can be found in the seminal paper by Codd 1970],
which introduced the relational model. Formal aspects of the relational model are presented in
detail in Maier 1983].
e d s
e
3.2
e
d
E
d
D
s
S
e d s
s
The Network and Hierarchical Models
The network data model is an abstraction of the design concepts used in the implementation of
databases. As a result, the model is tied more closely to physical-level design than is the relational
model. In the network model, data items are represented by collections of records and relationships
among data are represented by links, which correspond to pointers at the physical level.
The hierarchical model is similar to the network model except that links in the hierarchical
model must form a tree structure, while the network model allows arbitrary graphs.
4 Object-Relational Data Models
Object-relational data models are hybrids of the object-oriented and the relational data models.
They extend the relational data model by providing an extended type system and object-oriented
3
concepts such as object identity. The extended type systems allow complex types including nonatomic values such as nested relations, and inheritance at the level of attribute domains as well
as at the level of relations. Such extensions attempt to preserve the relational foundations, while
extending the modeling power.
There is a trend towards the amalgamation of features of the relational and object-oriented
models. The SQL-3 standard currently under development includes object-oriented features within
the framework of an extended version of the current relational SQL standard. Market-leading
relational database products are adding object-oriented features so as to compete with objectoriented and object-relational database products. Future database systems can be expected to oer
the high-level of abstraction of object-orientation along with the relative eciency and uniformity
of the relational model.
Bibliography
Chen 1976] P. P. Chen, \The Entity-Relationship Model: Toward a Unied View of Data," ACM
Transactions on Database Systems, Volume 1, Number 1 (January 1976), pages 9{36.
Codd 1970] E. F. Codd, \A Relational Model for Large Shared Data Banks," Communications
of the ACM, Volume 13, Number 6 (June 1970), pages 377{387.
Kim 1990] W. Kim, Introduction to Object-Oriented Databases, MIT Press, Cambridge, MA
(1990).
Maier 1983] D. Maier, The Theory of Relational Databases, Computer Science Press, Rockville,
MD (1983).
Silberschatz et al. 1996] A. Silberschatz, H. F. Korth, and S. Sudarshan, Database System
Concepts, Third Edition, McGraw Hill, New York, NY (1996).
Ullman 1988] J. D. Ullman, Principles of Database and Knowledge-base Systems, Volume I,
Computer Science Press, Rockville, MD (1988).
4
Download