Semantic Data Modelling for databases: Issues of modelling and

advertisement
Semantic Data Modelling for databases:
Issues of modelling and teaching the paradigm.
Trujillo J. *, King A. G. †, Palomar M. ‡
*
Dpto. de Economía Financiera. Universidad de Alicante.
E-03071. Alicante. Spain
E-mail: trujillo@dlsi.ua.es
†
System Engineering Faculty. Southampton Institute.
Southampton SO14 OYN. England.
E-mail: king_g@solent.ac.uk
‡
Dpto. de Lenguajes y Sistemas Informáticos.
Universidad de Alicante. E-03071. Alicante. Spain.
E-mail: mpalomar@dlsi.ua.es
Abstract
The aim of this paper is to discuss the role of Semantic Data Models (SDM) in database
modelling. First of all, a review about the classical models and paradigm used for database
modelling is reiterated. Secondly, a general description and advantages offered by SDM for
database modelling are described. Furthermore, current researches on SDM are reviewed.
Then, the influence of the Object-Oriented paradigm on Object-Oriented databases (OODB)
and SDM is reviewed. Finally, a better Object Orientation on SDM is suggested to improve the
quality of teaching methods in the database modelling area at Universities.
1 Introduction
To begin with, conceptual, logical and physical design can be distinguished
within the design stage in database modelling. The conceptual design (also
called conceptual data modelling) aims to capture object descriptions and
behaviour in the real world. Then, this conceptual design must be transformed
into the logical design finding structured representations for these objects in the
database. Finally, a database physical design is obtained. Three classical
models were used in the earlier 70’s for database modelling: network model,
CODASYL [8], hierarchical model, Tsichritzis [18] and relational model, Codd
[9].
The Network model provides a natural view of data although its logical
schema is quite difficult to understand in representing data relationships. On
the other hand, it presents a very high data dependence. The hierarchical model
presents a very weak natural view of data although it has got very high data
independence. Furthermore, the data manipulation language for both models
tend to be navigational, in the sense that the user must access data through the
physical connection rather than the real meaning of data. The main purpose of
the relational data model, whose last version can be found in Codd [10], is to
support a Structured Query Language (SQL) to access data in a more logical
way than the previous models.
Although this model is widely used for Business database modelling, it has
received many criticisms in the past years. Some of them can be found in
Beymon-Davies [1]. In addition, the scenario of database applications has
dramatically changed. New applications like CAD, CASE, office Information
systems or geoscientific databases need more sophisticated database
functionalities and the administration of more complex structured objects.
On the other hand, all of the classical models fail to capture much of the
semantics associated with data. In all three models, the fundamental modelling
construct, record or relation, does not constitute an atomic semantic unit. As a
result, all of them require additional constraints to maintain the semantic
integrity of the database. Moreover, as these records or relations may not
always correspond to a single object, these models require complex
normalisation procedures to be carried out. A main consequence is that
database conceptual schemes are often difficult to design.
In addition to the problems mentioned above, the static (object description)
and dynamic (object behaviour) aspects are separately studied when discussing
the conceptual design in the current paradigm for database modelling. Thus, the
data dynamic aspects are discussed by a Process methodology such as
Yourdon, Yourdon [19]. On the other hand, the entity-relationship model (E-R)
is the most common used model to capture the static aspects, Chen [5]. This
model was firstly created to support a conceptual schema for overcoming the
problem of lack of semantic mentioned above.
Two main disadvantages can be noted on this way of proceeding. On the
one hand, the operations are not defined on objects and data types. As a result,
a lot of operations are particularly of some program rather than the database
design. There is a design inconsistency because most operations implemented
on the objects in the database program are not reflected on the conceptual
schema. On the other hand, the allowed database states are discussed with the
database static aspects. Therefore, a lot of dynamic constraints are required.
They are usually expressed through a text-based language.
2 Semantic data models
Firstly, the motivation of using SDM for database modelling is reviewed. Then,
a general description and advantages offered by SDM are described. Finally,
current researches on SDM are reviewed.
2.1 Motivation
To begin with, a distinction between the earliest and latest SDM researches must
be produced. The earliest SDM were created in the 70’s. These models were
primarily introduced as schema design tools, i.e. a schema could be designed in a
SDM and then translated into a logical model like the relational model. In this
period, the main result of SDM was to support more powerful mechanisms to
represent the structural aspects of Business data than the ones supported by the
classical models (hierarchical, network and relational). The followed technique
was to provide a higher level of abstraction for data modelling, allowing
database designers to discuss the data as they appear in the real world and
supporting a top-down, modular view of the schema.
In recent years, the database modelling researches have been focused on
representing data behavioural aspects on the conceptual model. The behavioural
aspects were firstly considered in Brodie [2] incorporating transactions and
actions on objects and data types which were discussed with the structural
aspects at the same time due to the abstraction techniques supported by SDM.
These attempts have hardly been influenced by the Object-Oriented paradigm
owing to the emergency of Object-Oriented programming languages (OOPL).
2.2 General description
A lot of SDM in the literature do not use common terminology and are not
usually defined formally. Thus, in this section a brief description of various
features and components found in current researches.
- Entities (objects) and entity types (classes). An entity (object) can loosely be
defined as a thing that exists in the database and is distinghisable from the others
without ambiguity (atomic units). Objects can be real-world objects or can be
defined by other different purposes. Objects storing common characteristics are
categorised in entity types (classes).
- Type hierarchy (ISA, classification, generalisation, specialisation). This type
hierarchy feature is applied when objects storing common properties while
having other properties uniquely to them.
- Attributes and domains. Attributes are features defined on both entities and
entity types. On the other hand, domain is a set of values of similar type.
Attributes can take values from these domains.
- Relationships. A relationship represents instances of an association between
several objects. The nature of this relationship can be one-to-one, one-to-many or
many-to-many depending on the object quantity is taking part in the relationship.
A relationship type corresponds to a collection of similar relationships or an
aggregation of two or more entity types.
- Rules. Rules is a set of mechanisms that provide derived data and the set of
constraints that limit the facts which can be consistent with the model, being the
derived data virtual data defined by the user and not storing in the database. On
the other hand, constraints are used to express properties of data that cannot be
captured with the data structures. They are usually restrictions. Two different
kind of constraints can be noted, static ones (allowable database states) and
dynamic ones (restrict possible database transactions).
2.3 Advantages
The advantages supported by SDM are as follows:
a ) increased separation of logical and physical components.
As mentioned in the introduction section, a physical way must be produced to
access data in the classical models. Even in the relational model, where the user
must simulate pointers by comparing identifiers in order to transverse from one
relation to another, Hull [14]. In contrast, the attributes of SDM may be used as
direct conceptual pointers. Thus, SDM allow the user to navigate through the
schema by applying attributes directly to the objects.
b ) decreased semantic overloading of relationship types
The term semantic overloading refers to the sense that different kind of
relationships are represented using the same constructors. In the relational
model, for example, there are only two ways of representing relationships
between objects, within a relation or by using the same values in two or more
relations, Hull [14]. As it can be observed in the previous section, the SDM
provide very rich kind of constructors for representing the different kind of
objects and relationships between them in the real world.
c ) availability of convenient abstraction mechanisms
SDM provide mechanisms for viewing and accessing logical schema at
different levels of abstraction. All the constructors used in SDM perform the user
to access portion of the schema, i.e. objects, relationships and data types can be
accessed in isolation to the others. This allows the user to obtain derived schema
components and to identify a specific subset of data, possibly performing
computations on it, and then structuring it in a new format. While in relational
model derived schema components must be either new relations or new columns
in existing relations, SDM provide a more rich framework for derived data.
Owing to the recent researches in which there is a common attempt to
incorporate object behavioural aspects on the conceptual schema, two major
advantages can be noted, they are as follows:
d ) more consistency between logical and physical representations.
Due to defining operations on objects in the conceptual design, these object´s
functions implemented on the database are exactly the same defined on the
conceptual schema. The programmer does not have to interpret the conceptual
schema trying to obtain the adequate object´s operations must be implemented.
A better straight forward transformation into the physical schema can be
obtained.
e ) data base evolutions supported by integrity constraints
To execute the above mentioned operations, a group of dynamic constraints
must be checked to ensure the new database state is allowed in according to the
conceptual schema. These constraints define the performed database evolution.
Up to now, the most common way of doing this is through preconditions and
postconditions in the defined operations such as in Engels [11].
2.4 Review of current Semantic Data Models
A lot of researches have been developing on this area since the 70’s. Due to the
short size of this paper it is not possible to support a reference to all of them.
Thus a brief mention of the most relevant SDM is supported. To begin with,
there are two different approaches within this area in the sense of the way of
accessing and constructing data. While some SDM are based on attributes
(functions) applied on objects and data types, others are based on constructor
types (classification, aggregation, generalisation and association).
To begin with, a General Semantic Model is presented in Hull [14] providing
the general characteristics must be supported by SDM to capture structured data.
Another Semantic Data Model is presented in Hammer [13]. This model is
based on typical class structure of OOPL in which objects and functions that can
be applied on them are defined in the class construction. The main problem with
this model is that derived data are difficult to introduce on the schema, i.e. the
class structure is very rigid. In addition, the introduced class structure is not rich
enough to represent all the data relations presented in the real-world and not all
the needed constraints can be captured by it. Thus, another way of expressing
integrity constraints is required.
In the model presented in Brodie [3], specially emphasis has been laid on
incorporating behavioural aspects on objects in addition to provide all the
constructors needed to create new objects. The great advantage of this model is
that the operations are represented on the conceptual schema. But on the other
hand, the schema is quite complex to represent data structural aspects, special
problems can be noted in representing ISA constructors and relationships.
Two recent researches have been developed to extend the classical entityrelationship Model (E-R). To begin with, two relevant advantages are introduced
in Chen [6]. Firstly, a little modification of the SQL language is presented to
make it profitable to represent integrity constraints. Secondly, engaged actions
(update, delete, insert and query) on objects are treated. Nevertheless, there is no
a study about the transactions defined by the user. A second extension is
presented in Engels [11], where emphasis has been laid on incorporating
behavioural aspects on data. The SQL language is widely extended to represent
both static and dynamic constraints. Nevertheless, the transactions are not
represented on objects in the schema and they must be totally constructed by
hand.
The most relevant developed model based on attributes (functions) is the
Functional Data Model (FDM). The first version was developed in Shipman
[16]. Special emphasis has been laid on derived data, which are treated by
functions applied directly on attributes and objects. This FDM supports a
specification language called DAPLEX.
3 The Object-Oriented paradigm applied on databases and SDM
The Object-Oriented paradigm is being applied for database modelling due to the
emergency of OOPL. A lot of researches about OODB have been developing in
the last years applying the benefits supported by OOPL. These are mainly
reusability, modularity and extensibility, Graham [12]. Moreover, objects are
stored as they are defined. Therefore, they are ready to use and it is not necessary
any transformation while data storing nor recovering. As a consequence, time
and space optimisations are obtained.
Although there is no a formal model presently available for Object-Oriented
Design Method (OODM) for database conceptual modelling, most of the
researches and prototypes have followed a common paradigm. Firstly, a
definition of classes where similar objects are grouped is produced. Secondly,
new classes are constructed through inheritance. Finally, developing new classes
are defined as requirement evolves, Chorafas [6].
The emergency of OODB has a lot in common with Semantic Data
Modelling. They share some common notions. Both use the notion of objects
with unique object identity and the notion of inheritance through the hierarchy of
types. However, they have some differences. OODB hold properties such as
encapsulation or late binding that are not supported by SDM. Thus, OODB store
both data and the programs associated with objects. However, OODB have a lot
of semantic capabilities dismissed and their structure is quite rigid for derived
data.
We think that a SDM which incorporate both object static and dynamic
aspects could be a better approach to discuss the conceptual database modelling.
This is an evolution to Object-Oriented data models. Our researches are currently
focused on this area.
4 Improving the current teaching methods on conceptual data
modelling
The last point in this paper is to review the current teaching methods for database
modelling at Universities and how their quality should be improved. They are
not only the same as described in the introduction section, i.e. database static and
dynamic aspects are separately taught, but also, these teaching techniques are
discussed in different years in the degree in a lot of Universities, which means a
total separation between objects and operations defined on them.
It is widespread that the main problem designing large Information systems is
how to integrate the different kind of models used in the design stage in Yourdon
[19]. Although a lot of Software tools are being created to allow the
automatization of part of this process, the problem remains for very large
systems.
For this reason, students are not ready to solve all the problems presented in
the industry world because of this way of proceeding. Only their experiences in
designing very large systems can allow them to cope with it.
Therefore, this teaching method could be improved by the SDM suggested in
the previous section. Nevertheless, the problem with teaching this new approach
is that there is no formalism enough nor tools supporting the logical design
schema. This is the reason why the current paradigm is still being taught at
University, although it is widespread its disadvantages.
5 Conclusion
The purpose of this paper has been to support a general study about database
modelling. Although the current databases developed for Business databases are
using the classical model and paradigm, many disadvantages have been analysed
in the previous sections. Owing to these disadvantages, a lot of researches have
presented SDM since 1970’s for overcoming the deficiencies presented by this
classical way of proceeding. To improve the quality of database modelling, the
latest researches are focused on discussing database static and dynamic aspects at
the same time on the conceptual design. Unfortunately, there is no formal
method presently available to obtain this. It is supposed to find a formal ObjectOriented data model not only to be applied directly to the industry world, but
also to be taught at the University.
6 References
[1] Beymon-Danies, P. “Relational database systems”. Ed. Blackwell Scientific
publications. Chapter 8. 1991
[2] Brodie, M., Ridjanovic, D. “On the design and specification of Database
Transactions”. 1. 1984
[3] Brodie, M., Silva, E. “Active and Passive component modelling:
ACM/PCM”. Information systems design methodologies: A Comperative
Review. Ed Olle, Sol, Verryn-Stuart. North-Holland Publishing Company. 1982
[4] Canós, J.H. “OASIS: un lenguaje único para Bases de Datos Orientadas a
Objetos”. PhD. Thesis. U.P. Valencia 1996
[5] Chen, P. “The Entity-Relationship model”. ACM Trans. Database System
1(1). 9-36. 1976
[6] Chen, P. “The Entity-Relationship Model-Toward a Unified View of Data”.
Readings in Artificial Intelligence and databases. Ed. Mylopoulos, J., Brodie, M.
1989
[7] Chorafas, D., Steinmann, H. “Object-Oriented Databases”. Ed. Prentince
Hall. 1993
[8] Data base task group report. “CODASYL”. ACM, New York, 1971
[9] Codd, E.F. “A relational model
Communications ACM 13, 377-387. 1970
for
large
shared
databases”.
[10] Codd, E.F. “The relational model for Database Management, Version2”.
Addison-Wesley, Reading, Ma. 1990
[11] Engels, G., Gogolla, M., Hohenstein, U., Hulsmann, K., Lohr-Richter, P.,
Saake, G. and Ehrich, M.. “Conceptual modelling of database applications using
an extended ER”. Data and Knowledge Engineering, 9 (1992/93). 157-204.
North-Holland. 1993
[12] Graham, I. “Object Oriented Methods”. Addison-Wesley. 1994
[13] Hammer, M., McLeo, D. “Database Description with SDM: A semantic
Database Model”. ACM Transactions on Database Systems. Vol. 6, No.3.
September. 1981. Pages 351-386
[14] Hull, R., King, R. “Semantic Database Modeling: Survey, Applications and
Research Issues”. ACM Computing Surveys, Vol. 19, No.3. Sept. 1987
[15] Khoshafian, S. “Object-Oriented Databases”. Ed. Wiley Professional
Computing. 1993
[16] Shipman, D. “The Functional Data Model and the Data Language
DAPLEX”. ACM Transaction on Database Systems, Vol. 6, No.1, March 1981.
[17] Terbekke, J. “Semantic Data Modeling”. Ed. Prentice-Hall. 1992
[18] Tsichritzis, Lochovsky. “Hirerchical data-base management: a survey”
ACM Computing Surveys 8, 105-123. 1976.
[19]. Yourdon, E., “Modern Structure Analysis”. Ed. Prentice Hall. 1993
Download