University: Institution: Course: Lecturer: Date: Mälardalens högskola IDT Object Oriented Development, advanced course Mikael Sandberg 2000-05-18 Object- Oriented databases Halima Maronsi hmi98001@student.mdh.se David Löfqvist dlt99001@student.mdh.se Monica Larsson mln99005@student.mdh.se Summary We are three students attending a course, Object Oriented Development advanced course at Malardalens university. We will explain the mysterious of Object Oriented Database Systems, in a brief way. The rapport comprise different aspects of a database. We begin with a short history lesson, looks at relational DBS, Object Oriented Modelling, Object Oriented Development, ADA95 and database together, and ends with an interview with a consultant working at ABB. Day after day Data Base Systems plays a more and more important role in firms, companies and other organisations. Due to global networking increase the importance of DBS, and ”Data Autobahns ” means a lot for today’s communication and economy. In the same time, systematic implementation of DBS due to increasing of information volume, complexity of user’s requests, distributing and sharing the information in the Internet get more and more necessary. Getting an overview about the history of Data Bases is the first step to enter this world. 2 List of content 1. Summary 2 2. Dictionary and used abbreviations 5-6 3. History 7 4. Relational Data Bases 8 4.1 What’s the difference between relational and object oriented systems? 8 4.2 What is relational? 9 4.3 What is a database server? 10 4.4 Primary and Foreign key 11 4.5 Referential integrity 11 4.6 Normalisation 12 5 Object oriented modelling 13 5.1 Introduction 13 5.2 The phases of an object oriented system development model 13 5.3 Modelling the application, keywords and explanations 13 5.3.1 Class 13 5.3.2 Object 13 5.3.3 State 14 5.3.4 Behaviour 14 5.3.5 Associations, or relationships 14 6 OODBMS 17 6.1 Different approaches to develop an OODBMS 18 6.2 The OODBMS manifesto 18 6.3 OODBMS advantages 18 6.4 OODBMS disadvantages 19 3 6.5 OO- Database development- code examples 19 6.5.1 Defining a class 19 6.5.2 Defining a relationship 20 6.5.3 Defining generalisation 20 6.5.4 Create an object 20 6.5.5 OQL examples 20 6.5.6 Select 21 6.5.7 Aggregate operators 21 6.5.8 Grouped by 21 7 Ada a choice for OODBMS? 22 7.1 Ada Features That Support OODBMS 23 7.2 Built-In Concurrent Programming 25 7.3 Tasks 26 7.4 Possible Enhancements to Ada 28 8. Interview with a consultant at ABB BUS 29 9 Conclusion 32 10 References 33 4 A small dictionary Database = Database is an organised collection of logically related data. Views = A view is a virtual relation that does not actually exists in the database but is produced upon requests. The view provides a powerful and flexible security mechanism by hiding parts of the database from certain users. Persistence = The objects storage is managed by the OODBMS. The objects survive after the user session or application program has terminated. the opposite word is transient and that is where the objects memory is allocated and deallocated by the programming language’s runtime system, the objects last for the invocation of the program. Impedance mismatch = When mixing different programming paradigms we have to convert from one to the other paradigm. Transaction = An action or series of action, carried out by a single user or application program, which accesses or changes the contents of the database. Metadata = Data that describe the properties or characteristics of other data. Database application = An application program that is used to perform a series of database activities on behalf of database users. Data Warehouse = An integrated decision support database whose content is derived from the various operational databases. Data independence = The separation of data descriptions from the application programs that use the data. Use View = A logical description of some portion of database that is required by a user to perform some task. Constraint = A rule that cannot be violated by database users. Used abbreviations DBMS = Database Management System, a software application that is used to create, maintain, and provide controlled access to user databases. OO = Object Oriented OOP = Object Oriented Programming Language DML = Data Manipulation Language 5 ODL = Object Definition Language, equivalent to the DDL, data definition language, used in traditional DBMS. OQL = Object Query Language, equivalent to the DML, data manipulation language, used in traditional DBMS. DBS = DataBase System OODBMS = Object- Oriented Database Management System RDBMS = Relational Database Management System BLOBs = Binary Large Objects . A data value that contains binary information representing an image, a digitised video or audio sequence, or a large unstructured object. BUS = (ABB ) Business Systems ER = Entity-Relations VB = Visual Basic OOPL = Object Oriented Programming language 6 History The historical development of database systems can be divided into three generations: First_generation The hierarchical and network models (late 1960s and 1970s). File processing system were still dominant during 1960s. The first Database management system were introduced during that decade and were used primarily for large and complex ventures such as the Apollo moon-landing project. We can regard this as an experimental ”proof-of-concept” period in which the feasibility of managing vast amounts of data with a DBMS was demonstrated. Also, the first efforts at standardisation were taken with the formation of the Data Base Task Group in the late 1960s. During 1970s the use of database management system became a commercial reality. The hierarchical and network database management system were developed largely to cope with increasingly complex data structures such as manufacturing bills of materials that were extremely difficult to manage with conventional file processing methods. The hierarchical and network models are generally regarded as first generation DBMS. Both approaches were widely used, and in fact many of these systems continue to be used today. There are some major disadvantages: 1. Difficult access to data 2. Very limited data independence 3. No widely accepted theoretical foundation for either model. Second-generation To overcome these limitations, E. F. Codd and others developed the relational data model during the 1970s. This model, considered second-generation DBMS, received widespread commercial acceptance and diffusion during the 1980s. With the relational model, all data are represented in the form of tables. A relatively simple fourth- generation language called SQL ( for Structured Query Language) is used for data retrieval. Thus the relational model provides ease of access for non- programmers, overcoming one of the major objections to firstgeneration systems. Third-generation The decade of 1990s conclude with a new era of computing, first with client/server computing, then Internet applications becoming increasingly important. Whereas the data managed by a DBMS during the 1980s was largely structured (such as accounting data), multimedia data (including graphics, sound, images, and video) became increasingly common during the 1990s. To cope with these increasingly complex data, object-oriented databases (considered third generation) were introduced during the late 1980s. These databases are becoming increasingly important during the 1990s. 7 What is the Difference Between Relational and Object-Oriented Database System? Relational database of a cat: Object-oriented database of a cat: 8 What is Relational? The word relational, when applied to database systems, has a specific definition: It is a math/ computer science theory developed by Dr. E. F. Codd. This approach submits a data representation and storage scheme that attempts to use advanced relational algebra and its corresponding mathematical and logical properties as an optimum means of storing and accessing data in a database system. Relational Database Management System (RDBMS) A collection of integrated services which support database management and together support and control the creation, use and maintenance of relational databases. A relational database stores all its data inside tables, and nothing more. All operations on data are done on the tables themselves or produces another tables as the result. You never see anything except for tables. A table is a set of rows and columns. This is very important, because a set does not have any predefined sort order for its elements. Each row is a set of columns with only one value for each. All rows from the same table have the same set of columns, although some columns may have NULL values, i.e. the values for that rows was not initialised. Note that a NULL value for a string column is different from an empty string. You should think about a NULL value as an "unknown" value. The rows from a relational table is analogous to a record, and the columns to a field. Here's an example of a table and the SQL statement that creates the table: CREATE TABLE ADDR_BOOK ( NAME char(30), EDUCATION char(25), E_MAIL char(25) ) NAME EDUCATION E_MAIL David Löfqvist Computer_Science dlt99001@student.mdh.se Monica Larsson Computer_Science mln99005@student.mdh.se Halima Maronsi Computer_Engineering hmi98001@student.mdh.se There are two basic operations you can perform on a relational table. The first one is retrieving a subset of its columns. The second is retrieving a subset of its rows. Here are samples of the two operations: SELECT NAME, E_MAIL FROM ADDR_BOOK 9 NAME E_MAIL David Löfqvist dlt99001@student.mdh.se Monica Larsson mln99005@student.mdh.se Halima Maronsi hmi98001@student.mdh.se SELECT * FROM ADDR_BOOK WHERE EDUCATION = ’Computer_ Engineering’ NAME EDUCATION E_MAIL Halima Maronsi Computer_Engineering hmi98001@student.mdh.se You can also combine these two operations, as in: SELECT NAME, E_MAIL FROM ADDR_BOOK WHERE EDUCATION = ’Computer_ Engineering’ NAME E_MAIL Halima Maronsi hmi98001@student.mdh.se You can also perform operations between two tables, treating them as sets: you can make Cartesian product of the tables, you can get the intersection between two tables, you can add one table to another and so on. What is a database server? It's a specialised process that manages the database itself. The applications are clients to the database server and they never manipulates the database directly, but only make requests for the server to perform these operations. This allows the server to add many sophisticated features, such as transaction processing, recovery, backup, access control and etc. without increasing the complexity of every application. The server also reduces the risk of data file corruption, if only because only the server writes to the database (a crash on any client machine will not leave unflushed buffers). The key concepts that you must understand in order to design a database properly are primary and foreign keys, which are used to define relationships, referential integrity, which is used to maintain the validity of the data, and normalisation, which is used to develop a data structure. Once you have these concepts down, the rest of the details will fall into place more easily. 10 Primary and Foreign Keys A key is simply a field which can be used to identify a record. Each row in a table corresponds to one item or record. The position of a row may change whenever rows are added or deleted. Items stored in the tables can be identified only by their values. In order to uniquely identify an item it must be given a field or set of fields which is guaranteed to be unique within its table. This is called the primary key for the table. For example, say Monica Larsson has changed her e-mail address. How do I know the right row to update? Given the table ADDR_BOOK we've already been presented: UPDATE ADDR_BOOK SET E_MAIL = mln99005@hotmail.com WHERE NAME = ’Monica Larsson’ So the column Name identifies each row from ADDR_BOOK. Then, Name is said to be the primary key from table ADDR_BOOK. If one object contains the primary key of another object then this allows a relationship between the two items and is called a foreign key. Example: Invoice Invoice number 2345 2356 Customer number 2332 4321 Date 5/10/00 3/2/00 Invoice number is the primary key of the invoice table. Customer number is a foreign key referring to a row in the customers table: Customers Customer number 2332 1432 Name Halima Maronsi Monica Larsson Referential integrity Lets consider what happens when you start manipulating the records involved in the order entry system. You can edit the customer information at will without any ill effects, but what would happen if you needed to delete a customer? If the customer has orders, the orders will be orphaned. Clearly you can't have an order placed by a non-existent customer, so you must have something to enforce that for each order, there is a corresponding customer. This is the basis of enforcing referential integrity. There are two ways that you can enforce the validity of the data in this situation. One is by cascading deletions through the related tables, the other is by preventing deletions when related records exist. Database applications have several choices available for enforcing referential integrity, but if possible, you should let the database engine do its job and handle this for you. The latest 11 advanced database engines allow you to use declarative referential integrity. You specify a relationship between tables at design time, indicating if updates and deletes will cascade through related tables. If cascading updates are enabled, changes to the primary key in a table are propagated through related tables. If cascading deletes are enabled, deletions from a table are propagated through related tables. Normalisation Normalisation is the process of refining the structure of the database to the point where you have removed repeating groups of data into separate tables. A nice guide on how to design relational databases is the set of rules that define the three normal forms: 1. All column values are atomic 2. All column values depends on the value of the primary key 3. No column value depends on the value of any other column except the primary key. When you have applied the three rules, you say the database is on the Third Normal Form (3NF), or simply ”normalised”. A normalised database generally improves performance, lowers storage requirements, and makes it easier to change the application to add new features. 12 Object oriented modelling The object oriented approach is becoming popular because it supports effective representation of a real- world application, it can represent complex relationships, it can represent data and data- processing in a consistent notation. A object- oriented model is built with object’s, where a ER- model (Entity- Relational model) uses entities. An object encapsulates data and behaviour, so the object- oriented model can be used for data- modelling and process- modelling. The phases of a object- oriented system development model The model steps from abstract, focusing at the external qualities of the system. To more and more detailed, and focusing on how the system will be built and how it should function. Analysis: develop a model of the real- world application showing its important properties. Abstract concepts from the applications domain and describe what the system will do, rather than how it will be done. Structure the requirements, and really understand them. It specifies the functional behaviour of the system, independent to the environment. Design: The design phases look at how the analyse- model will be implemented in it’s environment. What operations an object provides, what sort of communication between objects, what messages will be sent and so on. It makes an overall system architecture, organise the system into components called subsystems. Builds the model by adding implementations details, data structure, algorithms, and control. Implementation: Use a programming language and the database management system. Coad and Yourdon(19911b), identifies several motivations and benefits of object- oriented modelling for example: Improved communication between user, analysts, designers and programmers. The ability to solve more complex problem domains. The UML notation can be useful to graphically depict an OO- analysis or design model. Modelling the applications, some keywords and explanations. In a ER- model the entity can be seen as an object. But an object can store state and behaviour, that affect or examines the state. Class: is a sort of template describing how the objects will be created, and how the objects will be represented according to state and behaviour, the class are supposed to encapsulate the internal state. A class can be abstract, or concrete. A abstract class don’t implements all or none of its methods. But it proposes that the children of the abstract class will implement it. A concrete class implements all of its methods. Object: an entity that has well- defined role in the application. It has state, behaviour and identity. A key part of the definition of an object a unique identity. In a n object- oriented system each object is assigned an OID (Object Identifier) when it is created. The OID is 13 system- generated and unique to that object. Once the object is created the OID will never be reused, even if the object is deleted. Its independent of the values of the attributes, and it should be invisible to the user. The Objects communicates by sending messages. A message is simply a request from one object to another object. An object sending message to another object don’t have to know anything about the receivers internals state, and that is what’s encapsulating is about. State: keeps an objects properties (attributes and relationships) and the values of the properties. Behaviour: how an object acts and reacts, the operations or methods an object provides. There’s three types of operations. - Constructor operations that creates a new instance of a class. - Query operation that accesses the state. - Update operations that alters the state of an object. The operation can also be abstract, it defines the form of a operation, but not the implementation. The methods can be overridden, when using inheritance. Class Objects Student Name dateOfBirth ... getAge() getDateOfBirth() Steve: Student Name = Steve DateOfBirth = ... Associations, or relationships A association is an relationship between object. The association can be unary. It can be binary, between two objects, and the relation can be ternary consisting of more than two objects. The relation can have different multiplicity that indicated how many objects participate in a given relationship. Unary relation: Is- married- to 0..1 Person 0..1 In a binary relationship the multiplicity between objects can be One- to- one, One- to- many and Many- to- many. A association between objects can have attributes and operations, here represented with a dashed line. 14 Student * * Course Registration ---------------grade getGrade() The association can be an aggregation, a- part- of relationship between objects. Also known as has- a or composition. For example the PC has- a CPU. Animal Mammal The OO- model expresses generalisation relationships using super- classes and subclasses. Car Boat Amphibious The inheritance can also be multiple. Inheritance is a very powerful mechanism because its support code reuse, and it provides polymorphism. Polymorphism is a key concept in OOsystems. There are three types of polymorphism operation, inclusion and parametric. A method defined in a super- class and inherited in its subclasses is an example of inclusion. Parametric or generic descriptions acts as a template for the later establishment of one or more different types. Overloading allows the name of a method be reused within a.) class definition or b.) across class definitions. a. This means that which method to be executed depends on what parameters are passed to the method. b. For example a super- class has a abstract method print(), and the subclasses has one print()- method to. The variable declared as the super- class can hold a value of one of the subclasses. Which print() method will be executed depends of what subclass- value the super-class variable holds. 15 Shape print() Circle print() Rectangle print() The process of selecting the appropriate method based upon an objects type is call binding. There is static binding and dynamic binding. The static or early binding refers to binding performed at compile time. The dynamic binding refers to binding performed at run time, as in example b.) above. 16 OODBMS Object- oriented approaches were first developed as a result of research into more effective programming techniques. As a result of this approach a set of criteria for development of object- oriented databases has begun to emerge. Development of object- oriented databases is still in its early stages. Due to the newness of this technique and the lack of programming expertise, mass production development has been restricted to a low volume of applications. Object- oriented databases are best suited for environments that are extremely complex but have well defined operation parameters. Object- orientation is a ”recent” approach to software constructions that shows considerable promise for solving some of the classic problems of software development. The underlying concept behind object technology is that all software should be constructed out of standard, reusable components wherever possible. Traditionally software engineering and database management have existed as separate disciplines. Database technology has concentrated on the static aspects of information storage while software engineering has modelled the dynamic aspects. With the third generation of database , the two disciplines have been combined to allow the concurrent modelling of both data and the processes acting upon data. Database systems are often concerned with the creation and maintenance of large, long- lived collections of data. Modern database systems support the following features: • The datamodel, a particular way of describing data, relations, and constrains. • Data persistence, the data can be stored ”forever”. • Data sharing, multiple applications or user can access the data, at the same time. • Reliability, the data should be protected from hardware, and software failures • Scalability, opportunity to operate on large amounts of data . • Security and integrity, protect the data from unauthorised access, and that data is assured correctness and consistency. • Distribution, physically distribute collections of shared data over a computer network, and making the distribution transparent to the user. Future Industry experts predicts that OODBMSs represents the most promising of the emerging database systems. While for traditional business applications, relational DBMSs are expected to maintain their holds of the market. But many applications, such as CAD, CASE, GIS , and when accessing the data from various tables require you to perform joins, which are extremely costly. This needs other support that RDBMS cant give namely OODBMS. 17 There are several approaches to develop an OODBMS 1. 2. 3. 4. 5. Extend an existing OOPL with database capabilities. Add traditional database capabilities to the language. Provide extensible OO – DBMS libraries. Rather than extending the language, class libraries are provided that supports persistence, transaction and so on. Embed OO database language constructs in a conventional host language, for example ‘C’. Extend an existing database language with OO- capabilities. Like the next release of SQL standard, SQL3. Develop a new database data model/data language. A radical approach to develop an entirely new database language and an entirely new OODBMS. They are trying to get a standard for OODBMS , the OODBMS Manifesto ( Atkinsson et al,. 1989a). The manifesto proposed 13 mandatory features. 1. Complex objects must be supported. 2. Object identity must be supported 3. Encapsulation must be supported. 4. Classes must be supported. 5. Classes must be able to inherit from their ancestors. 6. Dynamic binding must be supported. 7. The DML must be computationally complete, and be a general- purpose programming language. 8. The set of data must be extensible. There must be no distinction in usage between system- defined and user- defined types. 9. Data persistence must be provided. 10. The DBMS must be capable to manage very large databases. 11. The DBMS must be support concurrent users. 12. The DBMS must be able to recover from hardware and software failures. 13. The DBMS must provide a simple way of querying data. There are some advantages of OODBMS. 1. Enriched modelling capabilities. The OO- model allows the ’real world’ to be modelled. An object can store all the relationships it has with other objects, including many-to- many relationships, it also can form complex objects. 2. Extensibility. New data types can be built from existing types. (Super-class and subclass.) This can reduce redundancy, and with overriding the special cases can be handled easily. And the reusability of classes promotes faster development and easier maintenance. 3. Removal of Impedance mismatch. 4. More expressive query language. Navigational access from one object to another is provided, SQL has associative access. The navigational are better suited for handling parts explosion, recursive queries and so on. 5. Supports schema evolution. The coupling between data and the application in an OODBMS makes the evolution easier. Generalisation and inheritance allow the schema to be better structured , to be more intuitive, and to capture more of the applications semantic. 6. Support for long duration transactions 7. Applicability to advanced database applications. 8. Improved performance. 18 And there are some disadvantages of OODBMS, too. 1. Lack of universal data model. 2. Lack of experience. 3. Lack of standards. 4. Query optimisation compromises encapsulation. 5. Locking an object level may impact performance. 6. Complexity. Some increased functionality in the OODBMS is more complex than that of traditional DBMS. The complexity can lead to products that are more expensive and more difficult to use. 7. Lack of support for views. 8. Lack of support for security. If OODBMSs are to expand fully into the business field, this defect must be improved. OO- Database development A conceptual OO- model (maybe described in UML) can be transformed to a logical ODL schema. Here we show some examples. Keywords are bold. Defining a class: Class Student{ attribute string name /*structured datatypes */ attribute Date dateOfBirth attribute Adress address /*user defined structures */ relationship set <CourseOffering> takes inverse CourseOffering::taken_by short age() /*operations*/ } Defining an user structure: struct Address{ string street_adress string city string country } Defining a relationship: takes :a Student :a CourceOffering taken_by 19 Traversing from the Student to the CourceOffering gives the relationship take. Traversing the opposite gives the relation taken-by. In the CourceOffering operation will be specified as: relationship set <Student> taken_by inverse Student::takes Defining generalisation: Employee ExtraEmployee RegularEmployee class Employee{ ( extent employees) //... } class ExtraEmployee extends Employee{ ( extent ex_emp) //... } class RegularEmployee extends Employee{ ( extent reg_emp) //... } The keyword extent means that the extent of a class is the set of all instances of a class within the database. The extent ex_emp refers to all the ExtraEmployee instances in the database. Creating object instances: David Student {name ”David Löfqvist”, dateOfBirth //..} Here we show some examples of the OQL. Keywords are bold. David.dateOfBirth David.adress.city /* returns David’s date of birth*/ /* returns Västerås*/ 20 You can also use the select- statement: select s.name from student s where s. adress.city = ”Västerås” There is some aggregate operators you can use: count, sum, avg, max, and min. count (students) max. (select salary from employees) /*counts all instances of Student*/ You can partition a query response into chosen groups: select max. (e.salary) from employees e group by e.gender 21 Ada a choice for Object-oriented databases? Object-oriented programming languages (OOPLs) allow application developers to write object-oriented programs that will run in main memory. More complex object-oriented applications require persistent storage and concurrency control for their objects. Objectoriented database management systems (OODBMSs) were created to satisfy these requirements. An OODBMS adds persistence and concurrency control to an existing OOPL. The Object Data Management Group (ODMG) has published a standard (ODMG-93,release 1.2) for OODBMSs written in C++ and Smalltalk. These are capable OOPLs, but Ada95 has features that make it a superior choice to build OODBMSs. Unlike C++ or Smalltalk, the Ada language has built-in support for concurrent programming, as well as storage pools, which allow Ada objects to reside in persistent storage. The complexity required to provide the features desired in many modern products has made the inclusion of embedded processors and their associated software a necessity. As a result, more and more of a product's functionality is provided by software. OOPLs have been created to reduce the complexity and cost of this software. Modern OOPLs, like their predecessors, are designed to allow an application developer to create a complex sequence of instructions with minimal difficulty. The "sequence of instructions" paradigm for computer programs begins to break down when data management is required. Managing concurrent reads and writes to a piece of data is a complex task, as is ensuring consistency between related pieces of data. Database managers (DBMSs) are designed to perform these tasks so that application software does not have to manage its data but can instead merely use it. Traditionally, however, the interfaces for DBMSs have been "database languages," i.e., SQL, which have their own syntax and type systems, which are incompatible with computer programming languages. To circumvent this problem, programs that require a database manager have relied on special mechanisms, i.e., "embedded SQL" to access the database manager along with type conversion routines to translate between the type system of the programming language and the type system of the database language. This adds complexity to the application software by making it handle two kinds of data: "regular" data (variables, etc.) and "database data" (objects that must be translated to and from the database language. This situation is shown on the left side of Figure 1. Traditional DBMS Object-oriented DBMS DB DATA DATA I N T E R F A C E Figure 1. Traditional vs. object database manager in an OOPL application. 22 OODBMSs, however, are designed to use the same type system and syntax as a given OOPL. This way, there is no database language translation required and as a result, database objects look like regular objects (see right side of Figure 1). An OODBMS, therefore, extends an OOPL to include database capabilities. These capabilities include: Persistence - The data continues to exist after the program that created it has terminated, and the data is not lost if the power is turned off. Concurrency Control - The data can be accessed for read and write by concurrent programs without becoming corrupted. Ada has a reputation as an excellent language to build complex embedded software systems, especially those in safety-critical and real-time applications. Ada also provides "twodimensional" object-oriented programming—types can be extended via inheritance, and a package can be extended via child packages. This unique design allows extension for both the type and encapsulation mechanisms. These features make Ada an excellent OOPL, but Ada has two features in addition to these that make it the right choice to build an OODBMS: Storage Pools -Ada's storage pools allow seamless access to persistent storage. No special pre-processors or language extensions are required. Built-in Concurrent Programming -Ada's concurrent programming features (tasks, protected types, etc.) readily support the construction of a database concurrency control mechanism. Ada Features That Support OODBMS Storage Pools Ada's storage pools allow an application to replace the Ada heap manager with a user-defined storage manager. Storage pools are defined on access types, which means that Ada's built-in pointer operators will allocate and deallocate objects from a user-defined storage manager as well as the Ada heap. This makes the inclusion of a user-defined storage management system "seamless" because it is accessed the same way as Ada's built-in storage manager, e.g., the Ada heap. Ada 95 committee saw storage pools as a way to allow an application to use a specialised main memory heap manager instead of the general-purpose heap manager supplied by an Ada compiler vendor. There is nothing in the storage pool interface, however, that prevents it from being used to manage persistent storage as well as main memory. 23 Type Persistent_Pool is new System.Storage_Pools.Root_Storage_Pool with private; --Declare pool type. --Create pool type. type Persistent_Aircraft_Ptr is access Aircraft; for Persistent_Aircraft_Ptr’storage_pool us My_Pool; --Specify pool for an access type. Type Heap_Aircraft_Ptr is access Aircraft; My_Persistent_AC: Persistent_Aircraft_Ptr; My_Heap_AC : Heap_Aircraft_Ptr; T : Transaction; Begin --set up pool for 50Kb Set_Up(MY_Pool,50_000); --start database transaction T := Begin_Transaction; My_Persistent_AC := new Aircraft; --Allocate for Persistent_Pool type called. --Storage block allocated on persistent media --Ada pointer is set to point to cache buffer. My_Heap_AC := new Aircraft; --Ada heap allocator called. Pointer is set --to point to the object in the Ada heap. Take_Off(My_Persistent_AC.all); Take_off(My_Heap_AC.all); Commit (T); End of transaction All user-defined storage pool types must be derived from the Root_Storage_Pool type, which has no visible attributes. The Allocate and Deallocate procedures both have a Storage_Address parameter that contains a main memory address for the object in the storage pool. This parameter could, in both cases, pass the address of a cache buffer that contains the object. The cache manager and persistent input and output procedures used to access the object need not be visible to a user of the storage pool. Figure 2 shows an example that uses this approach. Figure 2. Example usage of storage pools to access persistent storage (“--“ is comments). As shown in Figure 2, operations for a type are not affected by what storage pool from which the type's instances are allocated. The use of storage pools to access storage of differing persistence therefore supports a key OODBMS concept- independent of persistence and type. An OODBMS must have this property to be compliant with the ODMG object model. If an application uses an Ada pointer that refers to a cache buffer, the application must be assured that the contents of the buffer will not be changed while the pointer is in use. Since the persistent storage pool is being accessed in a database context, the pointer to a cached object must be valid within the scope of its transaction. In other words, the pointer must be assigned after the start of a transaction (Figure 2). (The pointer assignment must come after 24 the "Begin_Transaction" function is called.) Likewise, the cache buffer it points to must not be altered until the transaction commits or aborts (Figure 2). (The cache buffer is locked until the Commit procedure is called.) The use of "Begin_Transaction" and "Commit" operators to bound a database transaction is consistent with the ODMG object model. Ada's System.Storage_Pools package, in summary, can be used to create an alternative storage manager type. Access types that have a storage pool representation clause will use the specified storage manager instead of the Ada heap when new or Unchecked_Deallocation are invoked. The Allocate and Deallocate procedures for a storage pool type both require the target object to be at a main memory address (type System.Address). For persistent storage, this address could be the starting address of a cache buffer for the object. However, the cache buffer must remain locked, e.g., unalterable, until the enclosing transaction commits or aborts. This design principle allows Ada's storage pools to provide persistent storage for Ada objects. Built-In Concurrent Programming Ada has two concurrent programming features, protected types and tasks, which can be used together to build a concurrency control mechanism for an OODBMS. Protected Types A concurrency control mechanism allows multiple transactions to simultaneously access data without conflict. The "pessimistic" concurrency control algorithm requires that any transaction that needs to update a piece of data be granted exclusive access to it. This prevents a read transaction from reading data that is being updated by a concurrent update transaction. an update transaction from modifying data that is being read by a concurrent read transaction. two or more update transactions from modifying the same data at the same time transactions that only need to read the data may access it concurrently. Transactions that only need to read the data may access it concurrently. Ada's protected types use a pessimistic concurrency control scheme to manage access to "protected data" by multiple Ada tasks. A protected type declaration is similar to an Ada package specification. The data that is protected by the concurrency control mechanism (the "protected data") appears in the private part of the type and may only be accessed by the entries, procedures, and functions whose specifications appear in the public part of the type. These are known as the type's protected operations. The protected operations work as follows: Procedures - Procedures in a protected type declaration are known as "protected procedures." When an Ada task invokes a protected procedure on an instance of a protected type, it is given exclusive access to the instance's protected data. The procedure may modify any of the protected data. Any task that invokes any other protected operation on that instance is suspended until the protected procedure completes. Functions - Functions in a protected type declaration are known as "protected functions." When an Ada task invokes a protected function on an instance of a protected type, it is given nonexclusive, read-only access to the instance's protected data; that is, the function may not modify any of the protected data. Because a protected function cannot modify any of the protected data, it is safe for any number of Ada tasks to simultaneously invoke protected functions for an instance of a protected type. Entries - When an Ada task invokes an entry call on an instance of a protected type, it is given exclusive access to the instance's protected data. The entry may modify any of the protected data. Any task that invokes any other protected operation on that instance is suspended until the entry completes. 25 Exclusive Access Occurred = FALSETRUE I := X GetCount; Occurrences = 0 I X.Signal; Task invokes protected procedur or entry for instance X. Which may modify X’s protected data. All other tasks attempting to access instance X are suspended on X’s que until the procedure/entry Cal completes. Read Access Instance X Occurred = FALSETRUE I := X.GetCount; I := X.GetCount; Multiple tasks may use protected functions to Read data concurrently. Occurrences = 0 I Figure 3. Protected type example. The built-in concurrency control features of protected types make them ideal to construct transaction tables and other lock management mechanisms for an OODBMS. This is because the "critical regions" in the transaction/lock tables are protected against conflicting updates whereas tasks that only need to read data from the tables, e.g., tasks that are invoking protected functions, will not block each other. Note that this is superior to encapsulating the tables within an Ada task since an Ada task may service only one rendezvous at a time, e.g., tasks are single-threaded, whereas an instance of a protected type can service many concurrent function calls. Protected types also are superior to semaphores for the same reason—a semaphore grants totally exclusive access to a critical region regardless of whether exclusive access is required. This precludes concurrent reads of the protected data. Tasks Tasks are the "atoms" of Ada's concurrent programming paradigm that is, the Ada run-time's priority and dispatching policies operate at the task level. Similarly, transactions are the "atoms" of a database manager's concurrency control policy. Transactions, like tasks, can be executed concurrently. The database manager must schedule transactions according to their priorities and concurrently execute them in a nonconflicting manner. Since Ada already has a task priority system and a run-time scheduler, it would make the most sense for an Ada OODBMS to use the Ada run-time to manage its transactions. This would make the OODBMS less complex internally since it would not have to include its own scheduler. It would also make the OODBMS easier to use since there would be only one set of priorities and dispatching policies (Ada's) rather than two. The key to make this work is to enforce a ratio of 1-to-1 26 correspondence between Ada tasks and OODBMS transactions; that is, an Ada task may have at most one active transaction at a time. To get concurrent execution of two or more transactions, the transactions would have to be executed within separate Ada tasks. This is shown in Figure 4. T1 := Begin_Transaction; --database operations T1 := Begin_Transaction; --database operations Commit (T1); Commit (T1); T2 := Begin_Transaction: --database operations T2 := Begin_Transaction: --database operations Commit(T2); Commit(T2); One active transation at a time in a task means transactions in the same task Execute serially. Since Ada tasks may execute concurrently, transactions in different tasks also execute concurrently. Figure 4. Transactions in a ratio of 1-to-1 mapping with Ada tasks. An OODBMS that uses the Ada task model to implement concurrent transactions not only offers the user a simpler interface but also a more powerful one. This is because an application could control the queuing and dispatching policies of the Ada run-time if the Ada compiler included the Real-Time Systems Annex. The features in this annex would be especially important for an Ada OODBMS in a real-time system. If an Ada OODBMS application must satisfy firm or hard real-time requirements, the priorities of its tasks (and thus transactions) can be established in accordance with the Rate Monotonic Scheduling (RMS) algorithm. The RMS algorithm assumes all of the tasks in a system are executed at a constant periodic rate, i.e., "task A is executed every 100 milliseconds." RMS assigns priorities to tasks based on their frequency of execution. The task most frequently executed gets the highest priority, and the least frequently executed task gets the lowest priority. In summary, Ada's concurrent programming features are well-suited to OODBMS construction. The concurrency control mechanism that is built into protected types makes them an ideal choice to build the internal tables and critical regions of an OODBMS. This is because protected types, unlike tasks and semaphores, allow concurrent read operations. Also, if an Ada OODBMS allows a task to have at most one active transaction at any given time, the Ada run-time's task scheduler can serve as the OODBMS transaction scheduler. This makes things simple for the Ada programmer and greatly simplifies the internals of the OODBMS. In addition, the task dispatching behaviour of protected types and the Ada runtime can be optimised for real-time systems via the Real-Time Systems Annex. 27 Possible Enhancements to Ada The Ada 95 standard is a quantum leap from its excellent predecessor, Ada 83. Ada 95 provides total support for object-oriented programming and has features that make it convenient to add OODBMS capabilities to the language. Because the first object-oriented database standard was not published until 1993 (ODMG-93), the Ada 95 standard was not drafted with database requirements in mind. This standard has undergone many fundamental changes in its last two releases, the first of which was in 1994 (ODMG-93, Release 1.1) and the second in 1996. It would therefore not be reasonable to expect the Ada 95 committee to have included special provisions for OODBMS construction into the standard since at the time there was not a mature consensus on what features an OODBMS should have. As more OODBMS products are built, a more mature consensus is beginning to form about what features the OODBMSs need 28 Interview with Per Hedfors consultant at ABB Business Systems. Questions What experience do you have about RDBMS? Even if hierarchical databases still are used, the RDBMS is what counts nowadays. BUS uses RDBMS because it fits the companies demands. What experience do you have about OODBMS? According to Mr. Hedfors the OODBMS don’t solve any problem for BUS, which the RDBMS can’t solve. The main purpose for the company is to build and maintain business systems for administration concerning not so complex data. Mr Hedfors made a investigation: OODBMS versus RDBMS, for a couple of years ago. He then turned out that the OODBMS caused them a performance loss. Mr Hedfors emphasises that the OODBMS may have improved these days. There is a big disadvantage concerning the OODBMS, there is no standard query language, like SQL for OODBMS. If you create a OODBMS with C++, the query language is C++( if you don’t want to convert it): And this causes a freedom- diminution. It seems that OOPL are very popular right now, how come that the OODBMS don’t have the same popularity? Why do people choose a RDBMS instead of a OODBMS? The market is trend- dependence , and now the RDBMS is what’s counts. There is a conservatism too, you know what you get....... When the RDBMS appeared in the late 1970 it took at least 10 years for it to break through. A company must consider a lot of aspects, the costs for education, what’s the benefits and so on. BUS uses mostly VB, and sometimes C. Questions based upon the book "DATABASE SYSTEM", written by Connoly, Begg and Strachan. • The computer industry had a significant change the last few years. In database systems we had used RDBMS for traditional business applications, such as order processing, inventory control, banking and so on. These existing RDBMS have proven inadequate for applications whose needs are different from the traditional business applications. These applications can be for example: CAD(Computer- Aided Design) A CAD database stores data relating to mechanical and electrical design covering, for example aircraft, and buildings. The data is characterised by a large number of types, each with a small number of instances. The design may be very large, perhaps with a millions of parts. When a design change occurs, its implications must be propagated through all the design. EX. CASE(Computer- Aided Software engineering) 29 A CASE database stores data relating to stages of the software development lifecycle : planning, requirement collection and analysis, design, implementation......The design can be extremely large and there may be hundreds of staff involved with the design. Is the RDBMS old- fashioned? If yes, how do your company solve this problem? According to the BUS- companies demands the RDBMS fits in very well. But Mr Hedfors means that in complex system, as the one mentioned above, the OODBMS are better suited. • The normalisation generally leads to relations that do not correspond with the entities in the real world. The many relations causes fragmentation in the physical representation. It’s inefficient, leading to many join- operations during a query. The join is the most costly operation to perform. Do you recognise this, but are there advantages that compensate this? The statement above can be quite right. The BUS- company solved this by using components in their design. They encapsulates everything that concerns a customer in a unity, with its own database (e.g.) and everything concerning an order, including its database in another unity. The order knows who its customer is, but there are no relations in the database between these objects. Between these units there is an interface that don’t have to be changed, even if the customer or order is changed. • In a RDBMS there’s only a fixed set of operations, such as set and tuple ( tuple is a row in the table) - oriented operations. But SQL- 92 don’t allow new operations to be specified. This is too restrictive to model the behaviour of the real- world objects. For example, a GIS(Geographic Information System ) application needs operations for distance, and intersection. What’s your experience? RDBMS can store procedures, so you can reuse them, that solves such specific operations. • ’Impedance mismatch’ is a big problem. Because we are mixing different programmingparadigms. SQL is a declarative language that handles rows of data, whereas a high- level language like ’C’ is a procedural language that can handle only a row of data at the time. SQL provides the built in data types Date and Interval, which are not available in traditional programming languages. It has estimated that as much as 30% of programming effort and code space is expended on this type of conversions. The RDBMS don’t provide iterations (while...) and selections (if...) . Is ’Impedance mismatch’ a problem for BUS? You can solve this with logic outside the database, its not a problem for us. • Many RDBMS don’t allow the storage of BLOBs. But some RDBMS can store for example a picture as a BLOB. However the picture can only be stored and displayed. It is 30 not possible to manipulate the internal structure of the picture, nor its possible to display or manipulate part of the picture. Do you recognise this? BUS don’t have that problem, but Mr. Hedfors understands that it can be a problem, and when the OODBMS can be useful. 31 Conclusion The first thing we did was to divide the task of writing this paper into three parts (History and RDBMS, OODBMS and if ADA95 would be a good OOPL to develop OODBMS with). Then we gathered as much information as we could, by searching on the Internet and by reading books. We often had small meetings where we discussed and shared the things we had found. To see if OODBMS is used in reality we interviewed a consult at a company. We found this subject very interesting, and there is a lot of information, but we had to make a certain limitations. We are at a historic point in software system research. Today’s operating and database systems are built on designs from a quarter-century ago. These systems do not address today’s computing environment naturally, and cannot adapt because they are so large and brittle. Our believes is that OODBMS will be more and more commonly used where RDBMS are used today. RDBMS will still be used in many areas as business systems and systems that don’t require the more complex functionality which an OODBMS provides. This will happen when OODBMS is accepted as an standard and more developers learn the process of building OODBMS. 32 References Rob Mattison, “Understanding Database Management Systems”, McGraw- Hill, ISBN 0-07-049999-3, 1997 Thomas Conolly, Carolyn Begg, Anne Strachan, “Database systems”, Addison- Wesley ISBN 0-201-34287-1, 1998 Fred. R. McFadden, Jeffrey A. Hoffer, Mary B. Prescott, “Modern Database Management”, Addison- Wesley, ISBN 0-8053-6054-9, 1999 Norman H. Cohen, “Ada as a second language”, McGraw- Hill, ISBN 0-07-011607-5, 1996 “Ada 95 Reference Manual” , International Standard ANSI/ISO/IEC-8652:1995, Infometrics, Cambridge, Mass., 1995. http://www.inherit.se/sv/index_db.html http://www.cetus-links.org/oo_data_bases.html http://www.cpsc.ucalgary.ca/~kremer7courses/547/oodb/index.htm http://www.adahome.com Questions 1. What is the advantage and disadvantages of Object- oriented databases? 2. Why is Tasks in Ada perfect for building databases? 3. What is the goal of Normalisation? 33