Semantic Data Modelling for databases: Issues of modelling and teaching the paradigm. Trujillo J. *, King A. G. †, Palomar M. ‡ * Dpto. de Economía Financiera. Universidad de Alicante. E-03071. Alicante. Spain E-mail: trujillo@dlsi.ua.es † System Engineering Faculty. Southampton Institute. Southampton SO14 OYN. England. E-mail: king_g@solent.ac.uk ‡ Dpto. de Lenguajes y Sistemas Informáticos. Universidad de Alicante. E-03071. Alicante. Spain. E-mail: mpalomar@dlsi.ua.es Abstract The aim of this paper is to discuss the role of Semantic Data Models (SDM) in database modelling. First of all, a review about the classical models and paradigm used for database modelling is reiterated. Secondly, a general description and advantages offered by SDM for database modelling are described. Furthermore, current researches on SDM are reviewed. Then, the influence of the Object-Oriented paradigm on Object-Oriented databases (OODB) and SDM is reviewed. Finally, a better Object Orientation on SDM is suggested to improve the quality of teaching methods in the database modelling area at Universities. 1 Introduction To begin with, conceptual, logical and physical design can be distinguished within the design stage in database modelling. The conceptual design (also called conceptual data modelling) aims to capture object descriptions and behaviour in the real world. Then, this conceptual design must be transformed into the logical design finding structured representations for these objects in the database. Finally, a database physical design is obtained. Three classical models were used in the earlier 70’s for database modelling: network model, CODASYL [8], hierarchical model, Tsichritzis [18] and relational model, Codd [9]. The Network model provides a natural view of data although its logical schema is quite difficult to understand in representing data relationships. On the other hand, it presents a very high data dependence. The hierarchical model presents a very weak natural view of data although it has got very high data independence. Furthermore, the data manipulation language for both models tend to be navigational, in the sense that the user must access data through the physical connection rather than the real meaning of data. The main purpose of the relational data model, whose last version can be found in Codd [10], is to support a Structured Query Language (SQL) to access data in a more logical way than the previous models. Although this model is widely used for Business database modelling, it has received many criticisms in the past years. Some of them can be found in Beymon-Davies [1]. In addition, the scenario of database applications has dramatically changed. New applications like CAD, CASE, office Information systems or geoscientific databases need more sophisticated database functionalities and the administration of more complex structured objects. On the other hand, all of the classical models fail to capture much of the semantics associated with data. In all three models, the fundamental modelling construct, record or relation, does not constitute an atomic semantic unit. As a result, all of them require additional constraints to maintain the semantic integrity of the database. Moreover, as these records or relations may not always correspond to a single object, these models require complex normalisation procedures to be carried out. A main consequence is that database conceptual schemes are often difficult to design. In addition to the problems mentioned above, the static (object description) and dynamic (object behaviour) aspects are separately studied when discussing the conceptual design in the current paradigm for database modelling. Thus, the data dynamic aspects are discussed by a Process methodology such as Yourdon, Yourdon [19]. On the other hand, the entity-relationship model (E-R) is the most common used model to capture the static aspects, Chen [5]. This model was firstly created to support a conceptual schema for overcoming the problem of lack of semantic mentioned above. Two main disadvantages can be noted on this way of proceeding. On the one hand, the operations are not defined on objects and data types. As a result, a lot of operations are particularly of some program rather than the database design. There is a design inconsistency because most operations implemented on the objects in the database program are not reflected on the conceptual schema. On the other hand, the allowed database states are discussed with the database static aspects. Therefore, a lot of dynamic constraints are required. They are usually expressed through a text-based language. 2 Semantic data models Firstly, the motivation of using SDM for database modelling is reviewed. Then, a general description and advantages offered by SDM are described. Finally, current researches on SDM are reviewed. 2.1 Motivation To begin with, a distinction between the earliest and latest SDM researches must be produced. The earliest SDM were created in the 70’s. These models were primarily introduced as schema design tools, i.e. a schema could be designed in a SDM and then translated into a logical model like the relational model. In this period, the main result of SDM was to support more powerful mechanisms to represent the structural aspects of Business data than the ones supported by the classical models (hierarchical, network and relational). The followed technique was to provide a higher level of abstraction for data modelling, allowing database designers to discuss the data as they appear in the real world and supporting a top-down, modular view of the schema. In recent years, the database modelling researches have been focused on representing data behavioural aspects on the conceptual model. The behavioural aspects were firstly considered in Brodie [2] incorporating transactions and actions on objects and data types which were discussed with the structural aspects at the same time due to the abstraction techniques supported by SDM. These attempts have hardly been influenced by the Object-Oriented paradigm owing to the emergency of Object-Oriented programming languages (OOPL). 2.2 General description A lot of SDM in the literature do not use common terminology and are not usually defined formally. Thus, in this section a brief description of various features and components found in current researches. - Entities (objects) and entity types (classes). An entity (object) can loosely be defined as a thing that exists in the database and is distinghisable from the others without ambiguity (atomic units). Objects can be real-world objects or can be defined by other different purposes. Objects storing common characteristics are categorised in entity types (classes). - Type hierarchy (ISA, classification, generalisation, specialisation). This type hierarchy feature is applied when objects storing common properties while having other properties uniquely to them. - Attributes and domains. Attributes are features defined on both entities and entity types. On the other hand, domain is a set of values of similar type. Attributes can take values from these domains. - Relationships. A relationship represents instances of an association between several objects. The nature of this relationship can be one-to-one, one-to-many or many-to-many depending on the object quantity is taking part in the relationship. A relationship type corresponds to a collection of similar relationships or an aggregation of two or more entity types. - Rules. Rules is a set of mechanisms that provide derived data and the set of constraints that limit the facts which can be consistent with the model, being the derived data virtual data defined by the user and not storing in the database. On the other hand, constraints are used to express properties of data that cannot be captured with the data structures. They are usually restrictions. Two different kind of constraints can be noted, static ones (allowable database states) and dynamic ones (restrict possible database transactions). 2.3 Advantages The advantages supported by SDM are as follows: a ) increased separation of logical and physical components. As mentioned in the introduction section, a physical way must be produced to access data in the classical models. Even in the relational model, where the user must simulate pointers by comparing identifiers in order to transverse from one relation to another, Hull [14]. In contrast, the attributes of SDM may be used as direct conceptual pointers. Thus, SDM allow the user to navigate through the schema by applying attributes directly to the objects. b ) decreased semantic overloading of relationship types The term semantic overloading refers to the sense that different kind of relationships are represented using the same constructors. In the relational model, for example, there are only two ways of representing relationships between objects, within a relation or by using the same values in two or more relations, Hull [14]. As it can be observed in the previous section, the SDM provide very rich kind of constructors for representing the different kind of objects and relationships between them in the real world. c ) availability of convenient abstraction mechanisms SDM provide mechanisms for viewing and accessing logical schema at different levels of abstraction. All the constructors used in SDM perform the user to access portion of the schema, i.e. objects, relationships and data types can be accessed in isolation to the others. This allows the user to obtain derived schema components and to identify a specific subset of data, possibly performing computations on it, and then structuring it in a new format. While in relational model derived schema components must be either new relations or new columns in existing relations, SDM provide a more rich framework for derived data. Owing to the recent researches in which there is a common attempt to incorporate object behavioural aspects on the conceptual schema, two major advantages can be noted, they are as follows: d ) more consistency between logical and physical representations. Due to defining operations on objects in the conceptual design, these object´s functions implemented on the database are exactly the same defined on the conceptual schema. The programmer does not have to interpret the conceptual schema trying to obtain the adequate object´s operations must be implemented. A better straight forward transformation into the physical schema can be obtained. e ) data base evolutions supported by integrity constraints To execute the above mentioned operations, a group of dynamic constraints must be checked to ensure the new database state is allowed in according to the conceptual schema. These constraints define the performed database evolution. Up to now, the most common way of doing this is through preconditions and postconditions in the defined operations such as in Engels [11]. 2.4 Review of current Semantic Data Models A lot of researches have been developing on this area since the 70’s. Due to the short size of this paper it is not possible to support a reference to all of them. Thus a brief mention of the most relevant SDM is supported. To begin with, there are two different approaches within this area in the sense of the way of accessing and constructing data. While some SDM are based on attributes (functions) applied on objects and data types, others are based on constructor types (classification, aggregation, generalisation and association). To begin with, a General Semantic Model is presented in Hull [14] providing the general characteristics must be supported by SDM to capture structured data. Another Semantic Data Model is presented in Hammer [13]. This model is based on typical class structure of OOPL in which objects and functions that can be applied on them are defined in the class construction. The main problem with this model is that derived data are difficult to introduce on the schema, i.e. the class structure is very rigid. In addition, the introduced class structure is not rich enough to represent all the data relations presented in the real-world and not all the needed constraints can be captured by it. Thus, another way of expressing integrity constraints is required. In the model presented in Brodie [3], specially emphasis has been laid on incorporating behavioural aspects on objects in addition to provide all the constructors needed to create new objects. The great advantage of this model is that the operations are represented on the conceptual schema. But on the other hand, the schema is quite complex to represent data structural aspects, special problems can be noted in representing ISA constructors and relationships. Two recent researches have been developed to extend the classical entityrelationship Model (E-R). To begin with, two relevant advantages are introduced in Chen [6]. Firstly, a little modification of the SQL language is presented to make it profitable to represent integrity constraints. Secondly, engaged actions (update, delete, insert and query) on objects are treated. Nevertheless, there is no a study about the transactions defined by the user. A second extension is presented in Engels [11], where emphasis has been laid on incorporating behavioural aspects on data. The SQL language is widely extended to represent both static and dynamic constraints. Nevertheless, the transactions are not represented on objects in the schema and they must be totally constructed by hand. The most relevant developed model based on attributes (functions) is the Functional Data Model (FDM). The first version was developed in Shipman [16]. Special emphasis has been laid on derived data, which are treated by functions applied directly on attributes and objects. This FDM supports a specification language called DAPLEX. 3 The Object-Oriented paradigm applied on databases and SDM The Object-Oriented paradigm is being applied for database modelling due to the emergency of OOPL. A lot of researches about OODB have been developing in the last years applying the benefits supported by OOPL. These are mainly reusability, modularity and extensibility, Graham [12]. Moreover, objects are stored as they are defined. Therefore, they are ready to use and it is not necessary any transformation while data storing nor recovering. As a consequence, time and space optimisations are obtained. Although there is no a formal model presently available for Object-Oriented Design Method (OODM) for database conceptual modelling, most of the researches and prototypes have followed a common paradigm. Firstly, a definition of classes where similar objects are grouped is produced. Secondly, new classes are constructed through inheritance. Finally, developing new classes are defined as requirement evolves, Chorafas [6]. The emergency of OODB has a lot in common with Semantic Data Modelling. They share some common notions. Both use the notion of objects with unique object identity and the notion of inheritance through the hierarchy of types. However, they have some differences. OODB hold properties such as encapsulation or late binding that are not supported by SDM. Thus, OODB store both data and the programs associated with objects. However, OODB have a lot of semantic capabilities dismissed and their structure is quite rigid for derived data. We think that a SDM which incorporate both object static and dynamic aspects could be a better approach to discuss the conceptual database modelling. This is an evolution to Object-Oriented data models. Our researches are currently focused on this area. 4 Improving the current teaching methods on conceptual data modelling The last point in this paper is to review the current teaching methods for database modelling at Universities and how their quality should be improved. They are not only the same as described in the introduction section, i.e. database static and dynamic aspects are separately taught, but also, these teaching techniques are discussed in different years in the degree in a lot of Universities, which means a total separation between objects and operations defined on them. It is widespread that the main problem designing large Information systems is how to integrate the different kind of models used in the design stage in Yourdon [19]. Although a lot of Software tools are being created to allow the automatization of part of this process, the problem remains for very large systems. For this reason, students are not ready to solve all the problems presented in the industry world because of this way of proceeding. Only their experiences in designing very large systems can allow them to cope with it. Therefore, this teaching method could be improved by the SDM suggested in the previous section. Nevertheless, the problem with teaching this new approach is that there is no formalism enough nor tools supporting the logical design schema. This is the reason why the current paradigm is still being taught at University, although it is widespread its disadvantages. 5 Conclusion The purpose of this paper has been to support a general study about database modelling. Although the current databases developed for Business databases are using the classical model and paradigm, many disadvantages have been analysed in the previous sections. Owing to these disadvantages, a lot of researches have presented SDM since 1970’s for overcoming the deficiencies presented by this classical way of proceeding. To improve the quality of database modelling, the latest researches are focused on discussing database static and dynamic aspects at the same time on the conceptual design. Unfortunately, there is no formal method presently available to obtain this. It is supposed to find a formal ObjectOriented data model not only to be applied directly to the industry world, but also to be taught at the University. 6 References [1] Beymon-Danies, P. “Relational database systems”. Ed. Blackwell Scientific publications. Chapter 8. 1991 [2] Brodie, M., Ridjanovic, D. “On the design and specification of Database Transactions”. 1. 1984 [3] Brodie, M., Silva, E. “Active and Passive component modelling: ACM/PCM”. Information systems design methodologies: A Comperative Review. Ed Olle, Sol, Verryn-Stuart. North-Holland Publishing Company. 1982 [4] Canós, J.H. “OASIS: un lenguaje único para Bases de Datos Orientadas a Objetos”. PhD. Thesis. U.P. Valencia 1996 [5] Chen, P. “The Entity-Relationship model”. ACM Trans. Database System 1(1). 9-36. 1976 [6] Chen, P. “The Entity-Relationship Model-Toward a Unified View of Data”. Readings in Artificial Intelligence and databases. Ed. Mylopoulos, J., Brodie, M. 1989 [7] Chorafas, D., Steinmann, H. “Object-Oriented Databases”. Ed. Prentince Hall. 1993 [8] Data base task group report. “CODASYL”. ACM, New York, 1971 [9] Codd, E.F. “A relational model Communications ACM 13, 377-387. 1970 for large shared databases”. [10] Codd, E.F. “The relational model for Database Management, Version2”. Addison-Wesley, Reading, Ma. 1990 [11] Engels, G., Gogolla, M., Hohenstein, U., Hulsmann, K., Lohr-Richter, P., Saake, G. and Ehrich, M.. “Conceptual modelling of database applications using an extended ER”. Data and Knowledge Engineering, 9 (1992/93). 157-204. North-Holland. 1993 [12] Graham, I. “Object Oriented Methods”. Addison-Wesley. 1994 [13] Hammer, M., McLeo, D. “Database Description with SDM: A semantic Database Model”. ACM Transactions on Database Systems. Vol. 6, No.3. September. 1981. Pages 351-386 [14] Hull, R., King, R. “Semantic Database Modeling: Survey, Applications and Research Issues”. ACM Computing Surveys, Vol. 19, No.3. Sept. 1987 [15] Khoshafian, S. “Object-Oriented Databases”. Ed. Wiley Professional Computing. 1993 [16] Shipman, D. “The Functional Data Model and the Data Language DAPLEX”. ACM Transaction on Database Systems, Vol. 6, No.1, March 1981. [17] Terbekke, J. “Semantic Data Modeling”. Ed. Prentice-Hall. 1992 [18] Tsichritzis, Lochovsky. “Hirerchical data-base management: a survey” ACM Computing Surveys 8, 105-123. 1976. [19]. Yourdon, E., “Modern Structure Analysis”. Ed. Prentice Hall. 1993