Object- Oriented databases - Research

advertisement
University:
Institution:
Course:
Lecturer:
Date:
Mälardalens högskola
IDT
Object Oriented Development, advanced course
Mikael Sandberg
2000-05-18
Object- Oriented databases
Halima Maronsi hmi98001@student.mdh.se
David Löfqvist dlt99001@student.mdh.se
Monica Larsson mln99005@student.mdh.se
Summary
We are three students attending a course, Object Oriented Development advanced course at
Malardalens university. We will explain the mysterious of Object Oriented Database Systems,
in a brief way. The rapport comprise different aspects of a database. We begin with a short
history lesson, looks at relational DBS, Object Oriented Modelling, Object Oriented
Development, ADA95 and database together, and ends with an interview with a consultant
working at ABB.
Day after day Data Base Systems plays a more and more important role in firms, companies
and other organisations. Due to global networking increase the importance of DBS,
and ”Data Autobahns ” means a lot for today’s communication and economy.
In the same time, systematic implementation of DBS due to increasing of information
volume, complexity of user’s requests, distributing and sharing the information in the Internet
get more and more necessary. Getting an overview about the history of Data Bases is the first
step to enter this world.
2
List of content
1.
Summary
2
2.
Dictionary and used abbreviations
5-6
3.
History
7
4.
Relational Data Bases
8
4.1
What’s the difference between relational and object oriented systems?
8
4.2
What is relational?
9
4.3
What is a database server?
10
4.4
Primary and Foreign key
11
4.5
Referential integrity
11
4.6
Normalisation
12
5
Object oriented modelling
13
5.1
Introduction
13
5.2
The phases of an object oriented system development model
13
5.3
Modelling the application, keywords and explanations
13
5.3.1
Class
13
5.3.2
Object
13
5.3.3
State
14
5.3.4
Behaviour
14
5.3.5
Associations, or relationships
14
6
OODBMS
17
6.1
Different approaches to develop an OODBMS
18
6.2
The OODBMS manifesto
18
6.3
OODBMS advantages
18
6.4
OODBMS disadvantages
19
3
6.5
OO- Database development- code examples
19
6.5.1
Defining a class
19
6.5.2
Defining a relationship
20
6.5.3
Defining generalisation
20
6.5.4
Create an object
20
6.5.5
OQL examples
20
6.5.6
Select
21
6.5.7
Aggregate operators
21
6.5.8
Grouped by
21
7
Ada a choice for OODBMS?
22
7.1
Ada Features That Support OODBMS
23
7.2
Built-In Concurrent Programming
25
7.3
Tasks
26
7.4
Possible Enhancements to Ada
28
8.
Interview with a consultant at ABB BUS
29
9
Conclusion
32
10
References
33
4
A small dictionary
Database
= Database is an organised collection of logically related data.
Views
= A view is a virtual relation that does not actually exists in the
database but is produced upon requests. The view provides a
powerful and flexible security mechanism by hiding parts of the
database from certain users.
Persistence
= The objects storage is managed by the OODBMS. The objects
survive after the user session or application program has terminated.
the opposite word is transient and that is where the objects memory
is allocated and deallocated by the programming language’s runtime
system, the objects last for the invocation of the program.
Impedance mismatch = When mixing different programming paradigms we have to
convert from one to the other paradigm.
Transaction
= An action or series of action, carried out by a single user or
application program, which accesses or changes the contents of the
database.
Metadata
= Data that describe the properties or characteristics of other data.
Database application = An application program that is used to perform a series of database
activities on behalf of database users.
Data Warehouse
= An integrated decision support database whose content is derived
from the various operational databases.
Data independence = The separation of data descriptions from the application programs
that use the data.
Use View
= A logical description of some portion of database that is required by
a user to perform some task.
Constraint
= A rule that cannot be violated by database users.
Used abbreviations
DBMS
= Database Management System, a software application that is used to
create, maintain, and provide controlled access to user databases.
OO
= Object Oriented
OOP
= Object Oriented Programming Language
DML
= Data Manipulation Language
5
ODL
= Object Definition Language, equivalent to the DDL, data definition
language, used in traditional DBMS.
OQL
= Object Query Language, equivalent to the DML, data manipulation
language, used in traditional DBMS.
DBS
= DataBase System
OODBMS
= Object- Oriented Database Management System
RDBMS
= Relational Database Management System
BLOBs
= Binary Large Objects . A data value that contains binary information
representing an image, a digitised video or audio sequence, or a large
unstructured object.
BUS
= (ABB ) Business Systems
ER
= Entity-Relations
VB
= Visual Basic
OOPL
= Object Oriented Programming language
6
History
The historical development of database systems can be divided into three generations:
First_generation
The hierarchical and network models (late 1960s and 1970s).
File processing system were still dominant during 1960s. The first Database management
system were introduced during that decade and were used primarily for large and complex
ventures such as the Apollo moon-landing project. We can regard this as an experimental
”proof-of-concept” period in which the feasibility of managing vast amounts of data with a
DBMS was demonstrated. Also, the first efforts at standardisation were taken with the
formation of the Data Base Task Group in the late 1960s.
During 1970s the use of database management system became a commercial reality. The
hierarchical and network database management system were developed largely to cope with
increasingly complex data structures such as manufacturing bills of materials that were
extremely difficult to manage with conventional file processing methods. The hierarchical and
network models are generally regarded as first generation DBMS. Both approaches were
widely used, and in fact many of these systems continue to be used today. There are some
major disadvantages:
1. Difficult access to data
2. Very limited data independence
3. No widely accepted theoretical foundation for either model.
Second-generation
To overcome these limitations, E. F. Codd and others developed the relational data model
during the 1970s. This model, considered second-generation DBMS, received widespread
commercial acceptance and diffusion during the 1980s. With the relational model, all data are
represented in the form of tables. A relatively simple fourth- generation language called SQL
( for Structured Query Language) is used for data retrieval. Thus the relational model provides
ease of access for non- programmers, overcoming one of the major objections to firstgeneration systems.
Third-generation
The decade of 1990s conclude with a new era of computing, first with client/server
computing, then Internet applications becoming increasingly important. Whereas the data
managed by a DBMS during the 1980s was largely structured (such as accounting data),
multimedia data (including graphics, sound, images, and video) became increasingly common
during the 1990s. To cope with these increasingly complex data, object-oriented databases
(considered third generation) were introduced during the late 1980s. These databases are
becoming increasingly important during the 1990s.
7
What is the Difference Between Relational and Object-Oriented Database
System?
Relational database of a cat:
Object-oriented database of a cat:
8
What is Relational?
The word relational, when applied to database systems, has a specific definition:
It is a math/ computer science theory developed by Dr. E. F. Codd. This
approach submits a data representation and storage scheme that attempts to use
advanced relational algebra and its corresponding mathematical and logical
properties as an optimum means of storing and accessing data in a database
system.
Relational Database Management System (RDBMS) A collection of integrated services which
support database management and together support and control the creation, use and
maintenance of relational databases.
A relational database stores all its data inside tables, and nothing more. All operations on data
are done on the tables themselves or produces another tables as the result. You never see
anything except for tables.
A table is a set of rows and columns. This is very important, because a set does not have any
predefined sort order for its elements. Each row is a set of columns with only one value for
each. All rows from the same table have the same set of columns, although some columns
may have NULL values, i.e. the values for that rows was not initialised. Note that a NULL
value for a string column is different from an empty string. You should think about a NULL
value as an "unknown" value.
The rows from a relational table is analogous to a record, and the columns to a field. Here's an
example of a table and the SQL statement that creates the table:
CREATE TABLE ADDR_BOOK (
NAME char(30),
EDUCATION char(25),
E_MAIL char(25)
)
NAME
EDUCATION
E_MAIL
David Löfqvist
Computer_Science
dlt99001@student.mdh.se
Monica Larsson
Computer_Science
mln99005@student.mdh.se
Halima Maronsi
Computer_Engineering
hmi98001@student.mdh.se
There are two basic operations you can perform on a relational table. The first one is
retrieving a subset of its columns. The second is retrieving a subset of its rows. Here are
samples of the two operations:
SELECT NAME, E_MAIL FROM ADDR_BOOK
9
NAME
E_MAIL
David Löfqvist
dlt99001@student.mdh.se
Monica Larsson
mln99005@student.mdh.se
Halima Maronsi
hmi98001@student.mdh.se
SELECT * FROM ADDR_BOOK WHERE EDUCATION = ’Computer_ Engineering’
NAME
EDUCATION E_MAIL
Halima Maronsi
Computer_Engineering
hmi98001@student.mdh.se
You can also combine these two operations, as in:
SELECT NAME, E_MAIL FROM ADDR_BOOK WHERE EDUCATION = ’Computer_
Engineering’
NAME
E_MAIL
Halima Maronsi
hmi98001@student.mdh.se
You can also perform operations between two tables, treating them as sets: you can make
Cartesian product of the tables, you can get the intersection between two tables, you can add
one table to another and so on.
What is a database server?
It's a specialised process that manages the database itself. The applications are clients to the
database server and they never manipulates the database directly, but only make requests for
the server to perform these operations.
This allows the server to add many sophisticated features, such as transaction processing,
recovery, backup, access control and etc. without increasing the complexity of every
application. The server also reduces the risk of data file corruption, if only because only the
server writes to the database (a crash on any client machine will not leave unflushed buffers).
The key concepts that you must understand in order to design a database properly are
primary and foreign keys, which are used to define relationships, referential integrity,
which is used to maintain the validity of the data, and normalisation, which is used to
develop a data structure. Once you have these concepts down, the rest of the details will
fall into place more easily.
10
Primary and Foreign Keys
A key is simply a field which can be used to identify a record. Each row in a table
corresponds to one item or record. The position of a row may change whenever rows are
added or deleted. Items stored in the tables can be identified only by their values.
In order to uniquely identify an item it must be given a field or set of fields which is
guaranteed to be unique within its table. This is called the primary key for the table.
For example, say Monica Larsson has changed her e-mail address. How do I know the right
row to update? Given the table ADDR_BOOK we've already been presented:
UPDATE ADDR_BOOK SET E_MAIL = mln99005@hotmail.com
WHERE NAME = ’Monica Larsson’
So the column Name identifies each row from ADDR_BOOK. Then, Name is said to be the
primary key from table ADDR_BOOK.
If one object contains the primary key of another object then this allows a relationship
between the two items and is called a foreign key.
Example:
Invoice
Invoice number
2345
2356
Customer number
2332
4321
Date
5/10/00
3/2/00
Invoice number is the primary key of the invoice table. Customer number is a foreign key
referring to a row in the customers table:
Customers
Customer number
2332
1432
Name
Halima Maronsi
Monica Larsson
Referential integrity
Lets consider what happens when you start manipulating the records involved in the order
entry system. You can edit the customer information at will without any ill effects, but what
would happen if you needed to delete a customer? If the customer has orders, the orders will
be orphaned. Clearly you can't have an order placed by a non-existent customer, so you must
have something to enforce that for each order, there is a corresponding customer. This is the
basis of enforcing referential integrity. There are two ways that you can enforce the validity of
the data in this situation. One is by cascading deletions through the related tables, the other is
by preventing deletions when related records exist.
Database applications have several choices available for enforcing referential integrity, but if
possible, you should let the database engine do its job and handle this for you. The latest
11
advanced database engines allow you to use declarative referential integrity. You specify a
relationship between tables at design time, indicating if updates and deletes will cascade
through related tables. If cascading updates are enabled, changes to the primary key in a table
are propagated through related tables. If cascading deletes are enabled, deletions from a table
are propagated through related tables.
Normalisation
Normalisation is the process of refining the structure of the database to the point where you
have removed repeating groups of data into separate tables. A nice guide on how to design
relational databases is the set of rules that define the three normal forms:
1. All column values are atomic
2. All column values depends on the value of the primary key
3. No column value depends on the value of any other column except the primary key.
When you have applied the three rules, you say the database is on the Third Normal Form
(3NF), or simply ”normalised”. A normalised database generally improves performance,
lowers storage requirements, and makes it easier to change the application to add new
features.
12
Object oriented modelling
The object oriented approach is becoming popular because it supports effective representation
of a real- world application, it can represent complex relationships, it can represent data and
data- processing in a consistent notation.
A object- oriented model is built with object’s, where a ER- model (Entity- Relational model)
uses entities. An object encapsulates data and behaviour, so the object- oriented model can be
used for data- modelling and process- modelling.
The phases of a object- oriented system development model
The model steps from abstract, focusing at the external qualities of the system. To more and
more detailed, and focusing on how the system will be built and how it should function.
Analysis: develop a model of the real- world application showing its important
properties. Abstract concepts from the applications domain and describe what the
system will do, rather than how it will be done. Structure the requirements, and really
understand them. It specifies the functional behaviour of the system, independent to
the environment.
Design: The design phases look at how the analyse- model will be implemented in it’s
environment. What operations an object provides, what sort of communication between
objects, what messages will be sent and so on. It makes an overall system architecture,
organise the system into components called subsystems. Builds the model by adding
implementations details, data structure, algorithms, and control.
Implementation: Use a programming language and the database management system.
Coad and Yourdon(19911b), identifies several motivations and benefits of object- oriented
modelling for example:
Improved communication between user, analysts, designers and programmers. The ability to
solve more complex problem domains.
The UML notation can be useful to graphically depict an OO- analysis or design model.
Modelling the applications, some keywords and explanations.
In a ER- model the entity can be seen as an object. But an object can store state and
behaviour, that affect or examines the state.
Class: is a sort of template describing how the objects will be created, and how the objects
will be represented according to state and behaviour, the class are supposed to encapsulate
the internal state. A class can be abstract, or concrete. A abstract class don’t implements all or
none of its methods. But it proposes that the children of the abstract class will implement it. A
concrete class implements all of its methods.
Object: an entity that has well- defined role in the application. It has state, behaviour and
identity. A key part of the definition of an object a unique identity. In a n object- oriented
system each object is assigned an OID (Object Identifier) when it is created. The OID is
13
system- generated and unique to that object. Once the object is created the OID will never be
reused, even if the object is deleted. Its independent of the values of the attributes, and it
should be invisible to the user. The Objects communicates by sending messages. A message
is simply a request from one object to another object. An object sending message to another
object don’t have to know anything about the receivers internals state, and that is what’s
encapsulating is about.
State: keeps an objects properties (attributes and relationships) and the values of the
properties.
Behaviour: how an object acts and reacts, the operations or methods an object provides.
There’s three types of operations.
- Constructor operations that creates a new instance of a class.
- Query operation that accesses the state.
- Update operations that alters the state of an object.
The operation can also be abstract, it defines the form of a operation, but not the
implementation. The methods can be overridden, when using inheritance.
Class
Objects
Student
Name
dateOfBirth
...
getAge()
getDateOfBirth()
Steve: Student
Name = Steve
DateOfBirth =
...
Associations, or relationships
A association is an relationship between object. The association can be unary. It
can be binary, between two objects, and the relation can be ternary consisting of more than
two objects. The relation can have different multiplicity that indicated how many objects
participate in a given relationship.
Unary relation:
Is- married- to
0..1
Person
0..1
In a binary relationship the multiplicity between objects can be One- to- one, One- to- many
and Many- to- many.
A association between objects can have attributes and operations, here represented with a
dashed line.
14
Student
*
*
Course
Registration
---------------grade
getGrade()
The association can be an aggregation, a- part- of relationship between objects. Also known
as has- a or composition. For example the PC has- a CPU.
Animal
Mammal
The OO- model expresses generalisation relationships using super- classes and subclasses.
Car
Boat
Amphibious
The inheritance can also be multiple. Inheritance is a very powerful mechanism because its
support code reuse, and it provides polymorphism. Polymorphism is a key concept in OOsystems. There are three types of polymorphism operation, inclusion and parametric. A
method defined in a super- class and inherited in its subclasses is an example of inclusion.
Parametric or generic descriptions acts as a template for the later establishment of one or more
different types. Overloading allows the name of a method be reused within a.) class definition
or b.) across class definitions.
a. This means that which method to be executed depends on what parameters are passed to
the method.
b. For example a super- class has a abstract method print(), and the subclasses has one
print()- method to. The variable declared as the super- class can hold a value of one of the
subclasses. Which print() method will be executed depends of what subclass- value the
super-class variable holds.
15
Shape
print()
Circle
print()
Rectangle
print()
The process of selecting the appropriate method based upon an objects type is call binding.
There is static binding and dynamic binding. The static or early binding refers to binding
performed at compile time. The dynamic binding refers to binding performed at run time, as
in example b.) above.
16
OODBMS
Object- oriented approaches were first developed as a result of research into more effective
programming techniques. As a result of this approach a set of criteria for development of
object- oriented databases has begun to emerge. Development of object- oriented databases is
still in its early stages. Due to the newness of this technique and the lack of programming
expertise, mass production development has been restricted to a low volume of applications.
Object- oriented databases are best suited for environments that are extremely complex but
have well defined operation parameters.
Object- orientation is a ”recent” approach to software constructions that shows considerable
promise for solving some of the classic problems of software development. The underlying
concept behind object technology is that all software should be constructed out of standard,
reusable components wherever possible. Traditionally software engineering and database
management have existed as separate disciplines. Database technology has concentrated on
the static aspects of information storage while software engineering has modelled the dynamic
aspects. With the third generation of database , the two disciplines have been combined to
allow the concurrent modelling of both data and the processes acting upon data.
Database systems are often concerned with the creation and maintenance of large, long- lived
collections of data. Modern database systems support the following features:
• The datamodel, a particular way of describing data, relations, and constrains.
•
Data persistence, the data can be stored ”forever”.
•
Data sharing, multiple applications or user can access the data, at the same time.
•
Reliability, the data should be protected from hardware, and software failures
•
Scalability, opportunity to operate on large amounts of data .
•
Security and integrity, protect the data from unauthorised access, and that data is assured
correctness and consistency.
•
Distribution, physically distribute collections of shared data over a computer network, and
making the distribution transparent to the user.
Future
Industry experts predicts that OODBMSs represents the most promising of the emerging
database systems. While for traditional business applications, relational DBMSs are expected
to maintain their holds of the market. But many applications, such as CAD, CASE, GIS , and
when accessing the data from various tables require you to perform joins, which are extremely
costly. This needs other support that RDBMS cant give namely OODBMS.
17
There are several approaches to develop an OODBMS
1.
2.
3.
4.
5.
Extend an existing OOPL with database capabilities. Add traditional database
capabilities to the language.
Provide extensible OO – DBMS libraries. Rather than extending the language,
class libraries are provided that supports persistence, transaction and so on.
Embed OO database language constructs in a conventional host language, for
example ‘C’.
Extend an existing database language with OO- capabilities. Like the next
release of SQL standard, SQL3.
Develop a new database data model/data language. A radical approach to
develop an entirely new database language and an entirely new OODBMS.
They are trying to get a standard for OODBMS , the OODBMS Manifesto ( Atkinsson et
al,. 1989a). The manifesto proposed 13 mandatory features.
1.
Complex objects must be supported.
2.
Object identity must be supported
3.
Encapsulation must be supported.
4.
Classes must be supported.
5.
Classes must be able to inherit from their ancestors.
6.
Dynamic binding must be supported.
7.
The DML must be computationally complete, and be a general- purpose
programming language.
8.
The set of data must be extensible. There must be no distinction in usage
between system- defined and user- defined types.
9.
Data persistence must be provided.
10.
The DBMS must be capable to manage very large databases.
11.
The DBMS must be support concurrent users.
12.
The DBMS must be able to recover from hardware and software failures.
13.
The DBMS must provide a simple way of querying data.
There are some advantages of OODBMS.
1.
Enriched modelling capabilities. The OO- model allows the ’real world’ to be
modelled. An object can store all the relationships it has with other objects,
including many-to- many relationships, it also can form complex objects.
2.
Extensibility. New data types can be built from existing types. (Super-class and
subclass.) This can reduce redundancy, and with overriding the special cases can
be handled easily. And the reusability of classes promotes faster development
and easier maintenance.
3.
Removal of Impedance mismatch.
4.
More expressive query language. Navigational access from one object to another
is provided, SQL has associative access. The navigational are better suited for
handling parts explosion, recursive queries and so on.
5.
Supports schema evolution. The coupling between data and the application in an
OODBMS makes the evolution easier. Generalisation and inheritance allow the
schema to be better structured , to be more intuitive, and to capture more of the
applications semantic.
6.
Support for long duration transactions
7.
Applicability to advanced database applications.
8.
Improved performance.
18
And there are some disadvantages of OODBMS, too.
1.
Lack of universal data model.
2.
Lack of experience.
3.
Lack of standards.
4.
Query optimisation compromises encapsulation.
5.
Locking an object level may impact performance.
6.
Complexity. Some increased functionality in the OODBMS is more complex
than that of traditional DBMS. The complexity can lead to products that are
more expensive and more difficult to use.
7.
Lack of support for views.
8.
Lack of support for security. If OODBMSs are to expand fully into the business
field, this defect must be improved.
OO- Database development
A conceptual OO- model (maybe described in UML) can be transformed to a logical ODL
schema. Here we show some examples.
Keywords are bold.
Defining a class:
Class Student{
attribute string name
/*structured datatypes */
attribute Date dateOfBirth
attribute Adress address
/*user defined structures */
relationship set <CourseOffering> takes inverse CourseOffering::taken_by
short age()
/*operations*/
}
Defining an user structure:
struct Address{
string street_adress
string city
string country
}
Defining a relationship:
takes
:a Student
:a CourceOffering
taken_by
19
Traversing from the Student to the CourceOffering gives the relationship take.
Traversing the opposite gives the relation taken-by.
In the CourceOffering operation will be specified as:
relationship set <Student> taken_by inverse Student::takes
Defining generalisation:
Employee
ExtraEmployee
RegularEmployee
class Employee{
( extent employees)
//...
}
class ExtraEmployee extends Employee{
( extent ex_emp)
//...
}
class RegularEmployee extends Employee{
( extent reg_emp)
//...
}
The keyword extent means that the extent of a class is the set of all instances of a class within
the database. The extent ex_emp refers to all the ExtraEmployee instances in the database.
Creating object instances:
David Student {name ”David Löfqvist”, dateOfBirth //..}
Here we show some examples of the OQL.
Keywords are bold.
David.dateOfBirth
David.adress.city
/* returns David’s date of birth*/
/* returns Västerås*/
20
You can also use the select- statement:
select s.name
from student s
where s. adress.city = ”Västerås”
There is some aggregate operators you can use:
count, sum, avg, max, and min.
count (students)
max. (select salary from employees)
/*counts all instances of Student*/
You can partition a query response into chosen groups:
select max. (e.salary)
from employees e
group by e.gender
21
Ada a choice for Object-oriented databases?
Object-oriented programming languages (OOPLs) allow application developers to write
object-oriented programs that will run in main memory. More complex object-oriented
applications require persistent storage and concurrency control for their objects. Objectoriented database management systems (OODBMSs) were created to satisfy these
requirements. An OODBMS adds persistence and concurrency control to an existing OOPL.
The Object Data Management Group (ODMG) has published a standard (ODMG-93,release
1.2) for OODBMSs written in C++ and Smalltalk. These are capable OOPLs, but Ada95 has
features that make it a superior choice to build OODBMSs. Unlike C++ or Smalltalk, the Ada
language has built-in support for concurrent programming, as well as storage pools, which
allow Ada objects to reside in persistent storage.
The complexity required to provide the features desired in many modern products has made
the inclusion of embedded processors and their associated software
a necessity. As a result, more and more of a product's functionality is provided by software.
OOPLs have been created to reduce the complexity and cost of this software.
Modern OOPLs, like their predecessors, are designed to allow an application developer to
create a complex sequence of instructions with minimal difficulty. The
"sequence of instructions" paradigm for computer programs begins to break down when data
management is required. Managing concurrent reads and writes to a
piece of data is a complex task, as is ensuring consistency between related pieces of data.
Database managers (DBMSs) are designed to perform these tasks so that
application software does not have to manage its data but can instead merely use it.
Traditionally, however, the interfaces for DBMSs have been "database
languages," i.e., SQL, which have their own syntax and type systems, which are incompatible
with computer programming languages. To circumvent this problem,
programs that require a database manager have relied on special mechanisms, i.e., "embedded
SQL" to access the database manager along with type conversion
routines to translate between the type system of the programming language and the type
system of the database language. This adds complexity to the application software by making
it handle two kinds of data: "regular" data (variables, etc.) and "database data" (objects that
must be translated to and from the database language. This situation is shown on the left side
of Figure 1.
Traditional DBMS
Object-oriented DBMS
DB
DATA
DATA
I
N
T
E
R
F
A
C
E
Figure 1. Traditional vs. object database manager in an OOPL application.
22
OODBMSs, however, are designed to use the same type system and syntax as a given OOPL.
This way, there is no database language translation required and as a result, database objects
look like regular objects (see right side of Figure 1). An OODBMS, therefore, extends an
OOPL to include database capabilities. These capabilities include:
Persistence - The data continues to exist after the program that created it has terminated, and
the data is not lost if the power is turned off.
Concurrency Control - The data can be accessed for read and write by concurrent programs
without becoming corrupted.
Ada has a reputation as an excellent language to build complex embedded software
systems, especially those in safety-critical and real-time applications. Ada also provides "twodimensional" object-oriented programming—types can be extended via inheritance, and a
package can be extended via child packages. This unique design allows extension for both the
type and encapsulation mechanisms. These features make Ada an excellent OOPL, but Ada
has two features in addition to these that make it the right choice to build an OODBMS:
Storage Pools -Ada's storage pools allow seamless access to persistent storage. No special
pre-processors or language extensions are required.
Built-in Concurrent Programming -Ada's concurrent programming features (tasks, protected
types, etc.) readily support the construction of a database concurrency control mechanism.
Ada Features That Support OODBMS
Storage Pools
Ada's storage pools allow an application to replace the Ada heap manager with a user-defined
storage manager. Storage pools are defined on access types, which means that Ada's built-in
pointer operators will allocate and deallocate objects from a user-defined storage manager as
well as the Ada heap. This makes the inclusion of a user-defined storage management system
"seamless" because it is accessed the same way as Ada's built-in storage manager, e.g., the
Ada heap. Ada 95 committee saw storage pools as a way to allow an application to use a
specialised main memory heap manager instead of the general-purpose heap manager
supplied by an Ada compiler vendor. There is nothing in the storage pool interface, however,
that prevents it from being used to manage persistent storage as well as main memory.
23
Type Persistent_Pool is new
System.Storage_Pools.Root_Storage_Pool with private;
--Declare pool type.
--Create pool type.
type Persistent_Aircraft_Ptr is access Aircraft;
for Persistent_Aircraft_Ptr’storage_pool us My_Pool;
--Specify pool for an access type.
Type Heap_Aircraft_Ptr is access Aircraft;
My_Persistent_AC: Persistent_Aircraft_Ptr;
My_Heap_AC : Heap_Aircraft_Ptr;
T : Transaction;
Begin
--set up pool for 50Kb
Set_Up(MY_Pool,50_000);
--start database transaction
T := Begin_Transaction;
My_Persistent_AC := new Aircraft;
--Allocate for Persistent_Pool type called.
--Storage block allocated on persistent media
--Ada pointer is set to point to cache buffer.
My_Heap_AC := new Aircraft;
--Ada heap allocator called. Pointer is set
--to point to the object in the Ada heap.
Take_Off(My_Persistent_AC.all);
Take_off(My_Heap_AC.all);
Commit (T);
End of transaction
All user-defined storage pool types must be derived from the Root_Storage_Pool type, which
has no visible attributes. The Allocate and Deallocate procedures both have a
Storage_Address parameter that contains a main memory address for the object in the storage
pool. This parameter could, in both cases, pass the address of a cache buffer that contains the
object. The cache manager and persistent input and output procedures used to access the
object need not be visible to a user of the storage pool. Figure 2 shows an example that uses
this approach.
Figure 2. Example usage of storage pools to access persistent storage (“--“ is comments).
As shown in Figure 2, operations for a type are not affected by what storage pool from which
the type's instances are allocated. The use of storage pools to access storage of differing
persistence therefore supports a key OODBMS concept- independent of persistence and type.
An OODBMS must have this property to be compliant with the ODMG object model.
If an application uses an Ada pointer that refers to a cache buffer, the application must be
assured that the contents of the buffer will not be changed while the pointer is in use. Since
the persistent storage pool is being accessed in a database context, the pointer to a cached
object must be valid within the scope of its transaction. In other words, the pointer must be
assigned after the start of a transaction (Figure 2). (The pointer assignment must come after
24
the "Begin_Transaction" function is called.) Likewise, the cache buffer it points to must not
be altered until the transaction commits or aborts (Figure 2). (The cache buffer is locked until
the Commit procedure is called.) The use of "Begin_Transaction" and "Commit" operators to
bound a database transaction is consistent with the ODMG object model.
Ada's System.Storage_Pools package, in summary, can be used to create an alternative
storage manager type. Access types that have a storage pool representation clause will use the
specified storage manager instead of the Ada heap when new or Unchecked_Deallocation are
invoked. The Allocate and Deallocate procedures for a storage pool type both require the
target object to be at a main memory address (type System.Address). For persistent storage,
this address could be the starting address of a cache buffer for the object. However, the cache
buffer must remain locked, e.g., unalterable, until the enclosing transaction commits or aborts.
This design principle allows Ada's storage pools to provide persistent storage for Ada objects.
Built-In Concurrent Programming
Ada has two concurrent programming features, protected types and tasks, which can be used
together to build a concurrency control mechanism for an OODBMS.
Protected Types
A concurrency control mechanism allows multiple transactions to simultaneously access data
without conflict. The "pessimistic" concurrency control algorithm requires that any
transaction that needs to update a piece of data be granted exclusive access to it. This prevents
a read transaction from reading data that is being updated by a concurrent update transaction.
an update transaction from modifying data that is being read by a concurrent read transaction.
two or more update transactions from modifying the same data at the same time transactions
that only need to read the data may access it concurrently. Transactions that only need to read
the data may access it concurrently.
Ada's protected types use a pessimistic concurrency control scheme to manage access to
"protected data" by multiple Ada tasks. A protected type declaration is similar to an Ada
package specification. The data that is protected by the concurrency control mechanism (the
"protected data") appears in the private part of the type and may only be accessed by the
entries, procedures, and functions whose specifications appear in the public part of the type.
These are known as the type's protected operations. The protected operations work as follows:
Procedures - Procedures in a protected type declaration are known as "protected
procedures." When an Ada task invokes a protected procedure on an instance of a protected
type, it is given exclusive access to the instance's protected data. The procedure may modify
any of the protected data. Any task that invokes any other protected operation on that instance
is suspended until the protected procedure completes.
Functions - Functions in a protected type declaration are known as "protected functions."
When an Ada task invokes a protected function on an instance of a protected type, it is given
nonexclusive, read-only access to the instance's protected data; that is, the function may not
modify any of the protected data. Because a protected function cannot modify any of the
protected data, it is safe for any number of Ada tasks to simultaneously invoke protected
functions for an instance of a protected type.
Entries - When an Ada task invokes an entry call on an instance of a protected type, it is
given exclusive access to the instance's protected data. The entry may modify any of the
protected data. Any task that invokes any other protected operation on that instance is
suspended until the entry completes.
25
Exclusive Access
Occurred =
FALSETRUE
I := X
GetCount;
Occurrences = 0
I
X.Signal;
Task invokes protected
procedur or entry for
instance X. Which
may modify X’s
protected data.
All other tasks attempting
to access instance X are
suspended on X’s que
until the procedure/entry
Cal completes.
Read Access
Instance X
Occurred =
FALSETRUE
I := X.GetCount;
I := X.GetCount;
Multiple tasks may use
protected functions to
Read data concurrently.
Occurrences =
0
I
Figure 3. Protected type example.
The built-in concurrency control features of protected types make them ideal to construct
transaction tables and other lock management mechanisms for an OODBMS. This is because
the "critical regions" in the transaction/lock tables are protected against conflicting updates
whereas tasks that only need to read data from the tables, e.g., tasks that are invoking
protected functions, will not block each other. Note that this is superior to encapsulating the
tables within an Ada task since an Ada task may service only one rendezvous at a time, e.g.,
tasks are single-threaded, whereas an instance of a protected type can service many
concurrent function calls. Protected types also are superior to semaphores for the same
reason—a semaphore grants totally exclusive access to a critical region regardless of whether
exclusive access is required. This precludes concurrent reads of the protected data.
Tasks
Tasks are the "atoms" of Ada's concurrent programming paradigm that is, the Ada run-time's
priority and dispatching policies operate at the task level. Similarly, transactions are the
"atoms" of a database manager's concurrency control policy. Transactions, like tasks, can be
executed concurrently. The database manager must schedule transactions according to their
priorities and concurrently execute them in a nonconflicting manner. Since Ada already has a
task priority system and a run-time scheduler, it would make the most sense for an Ada
OODBMS to use the Ada run-time to manage its transactions. This would make the
OODBMS less complex internally since it would not have to include its own scheduler. It
would also make the OODBMS easier to use since there would be only one set of priorities
and dispatching policies (Ada's) rather than two. The key to make this work is to enforce a
ratio of 1-to-1
26
correspondence between Ada tasks and OODBMS transactions; that is, an Ada task may have
at most one active transaction at a time.
To get concurrent execution of two or more transactions, the transactions would have to be
executed within separate Ada tasks. This is shown in Figure 4.
T1 := Begin_Transaction;
--database operations
T1 := Begin_Transaction;
--database operations
Commit (T1);
Commit (T1);
T2 := Begin_Transaction:
--database operations
T2 := Begin_Transaction:
--database operations
Commit(T2);
Commit(T2);
One active transation at a time in a task
means transactions in the same task
Execute serially.
Since Ada tasks may execute concurrently,
transactions in different tasks also execute
concurrently.
Figure 4. Transactions in a ratio of 1-to-1 mapping with Ada tasks.
An OODBMS that uses the Ada task model to implement concurrent transactions not only
offers the user a simpler interface but also a more powerful one. This is because an
application could control the queuing and dispatching policies of the Ada run-time if the Ada
compiler included the Real-Time Systems Annex. The features in this annex would be
especially important for an Ada OODBMS in a real-time system.
If an Ada OODBMS application must satisfy firm or hard real-time requirements, the
priorities of its tasks (and thus transactions) can be established in accordance with the Rate
Monotonic Scheduling (RMS) algorithm. The RMS algorithm assumes all of the tasks in a
system are executed at a constant periodic rate, i.e., "task A is executed every 100
milliseconds." RMS assigns priorities to tasks based on their frequency of execution. The task
most frequently executed gets the highest priority, and the least frequently executed task gets
the lowest priority.
In summary, Ada's concurrent programming features are well-suited to OODBMS
construction. The concurrency control mechanism that is built into protected types makes
them an ideal choice to build the internal tables and critical regions of an OODBMS. This is
because protected types, unlike tasks and semaphores, allow concurrent read operations. Also,
if an Ada OODBMS allows a task to have at most one active transaction at any given time,
the Ada run-time's task scheduler can serve as the OODBMS transaction scheduler. This
makes things simple for the Ada programmer and greatly simplifies the internals of the
OODBMS. In addition, the task dispatching behaviour of protected types and the Ada runtime can be optimised for real-time systems via the Real-Time Systems Annex.
27
Possible Enhancements to Ada
The Ada 95 standard is a quantum leap from its excellent predecessor, Ada 83. Ada 95
provides total support for object-oriented programming and has features that make it
convenient to add OODBMS capabilities to the language. Because the first object-oriented
database standard was not published until 1993 (ODMG-93), the Ada 95 standard was not
drafted with database requirements in mind. This standard has undergone many fundamental
changes in its last two releases, the first of which was in 1994 (ODMG-93, Release 1.1)
and the second in 1996. It would therefore not be reasonable to expect the Ada 95 committee
to have included special provisions for OODBMS construction into the standard since at the
time there was not a mature consensus on what features an OODBMS should have. As more
OODBMS products are built, a more mature consensus is beginning to form about what
features the OODBMSs need
28
Interview with Per Hedfors consultant at ABB Business
Systems.
Questions
What experience do you have about RDBMS?
Even if hierarchical databases still are used, the RDBMS is what counts nowadays. BUS
uses RDBMS because it fits the companies demands.
What experience do you have about OODBMS?
According to Mr. Hedfors the OODBMS don’t solve any problem for BUS, which the
RDBMS can’t solve. The main purpose for the company is to build and maintain business
systems for administration concerning not so complex data. Mr Hedfors made a
investigation: OODBMS versus RDBMS, for a couple of years ago. He then turned out
that the OODBMS caused them a performance loss. Mr Hedfors emphasises that the
OODBMS may have improved these days. There is a big disadvantage concerning the
OODBMS, there is no standard query language, like SQL for OODBMS. If you create a
OODBMS with C++, the query language is C++( if you don’t want to convert it): And
this causes a freedom- diminution.
It seems that OOPL are very popular right now, how come that the OODBMS don’t
have the same popularity? Why do people choose a RDBMS instead of a OODBMS?
The market is trend- dependence , and now the RDBMS is what’s counts. There is a
conservatism too, you know what you get.......
When the RDBMS appeared in the late 1970 it took at least 10 years for it to break
through. A company must consider a lot of aspects, the costs for education, what’s the
benefits and so on. BUS uses mostly VB, and sometimes C.
Questions based upon the book "DATABASE SYSTEM", written by Connoly, Begg and
Strachan.
•
The computer industry had a significant change the last few years. In database systems we
had used RDBMS for traditional business applications, such as order processing,
inventory control, banking and so on. These existing RDBMS have proven inadequate for
applications whose needs are different from the traditional business applications.
These applications can be for example:
CAD(Computer- Aided Design)
A CAD database stores data relating to mechanical and electrical design covering, for
example aircraft, and buildings. The data is characterised by a large number of types, each
with a small number of instances. The design may be very large, perhaps with a millions
of parts. When a design change occurs, its implications must be propagated through all the
design.
EX. CASE(Computer- Aided Software engineering)
29
A CASE database stores data relating to stages of the software development lifecycle :
planning, requirement collection and analysis, design, implementation......The design can
be extremely large and there may be hundreds of staff involved with the design.
Is the RDBMS old- fashioned? If yes, how do your company solve this problem?
According to the BUS- companies demands the RDBMS fits in very well. But Mr
Hedfors means that in complex system, as the one mentioned above, the OODBMS are
better suited.
•
The normalisation generally leads to relations that do not correspond with the entities in
the real world. The many relations causes fragmentation in the physical representation.
It’s inefficient, leading to many join- operations during a query. The join is the most
costly operation to perform.
Do you recognise this, but are there advantages that compensate this?
The statement above can be quite right. The BUS- company solved this by using
components in their design. They encapsulates everything that concerns a customer in a
unity, with its own database (e.g.) and everything concerning an order, including its
database in another unity. The order knows who its customer is, but there are no relations
in the database between these objects. Between these units there is an interface that don’t
have to be changed, even if the customer or order is changed.
•
In a RDBMS there’s only a fixed set of operations, such as set and tuple ( tuple is a row in
the table) - oriented operations. But SQL- 92 don’t allow new operations to be specified.
This is too restrictive to model the behaviour of the real- world objects. For example, a
GIS(Geographic Information System ) application needs operations for distance, and
intersection.
What’s your experience?
RDBMS can store procedures, so you can reuse them, that solves such specific
operations.
•
’Impedance mismatch’ is a big problem. Because we are mixing different programmingparadigms. SQL is a declarative language that handles rows of data, whereas a high- level
language like ’C’ is a procedural language that can handle only a row of data at the time.
SQL provides the built in data types Date and Interval, which are not available in
traditional programming languages. It has estimated that as much as 30% of programming
effort and code space is expended on this type of conversions. The RDBMS don’t provide
iterations (while...) and selections (if...) .
Is ’Impedance mismatch’ a problem for BUS?
You can solve this with logic outside the database, its not a problem for us.
•
Many RDBMS don’t allow the storage of BLOBs. But some RDBMS can store for
example a picture as a BLOB. However the picture can only be stored and displayed. It is
30
not possible to manipulate the internal structure of the picture, nor its possible to display
or manipulate part of the picture.
Do you recognise this?
BUS don’t have that problem, but Mr. Hedfors understands that it can be a problem, and
when the OODBMS can be useful.
31
Conclusion
The first thing we did was to divide the task of writing this paper into three parts (History and
RDBMS, OODBMS and if ADA95 would be a good OOPL to develop OODBMS with).
Then we gathered as much information as we could, by searching on the Internet and by
reading books. We often had small meetings where we discussed and shared the things we
had found. To see if OODBMS is used in reality we interviewed a consult at a company.
We found this subject very interesting, and there is a lot of information, but we had to make a
certain limitations.
We are at a historic point in software system research. Today’s operating and database
systems are built on designs from a quarter-century ago. These systems do not address today’s
computing environment naturally, and cannot adapt because they are so large and brittle.
Our believes is that OODBMS will be more and more commonly used where RDBMS are
used today. RDBMS will still be used in many areas as business systems and systems that
don’t require the more complex functionality which an OODBMS provides. This will happen
when OODBMS is accepted as an standard and more developers learn the process of building
OODBMS.
32
References
Rob Mattison, “Understanding Database Management Systems”, McGraw- Hill, ISBN 0-07-049999-3, 1997
Thomas Conolly, Carolyn Begg, Anne Strachan, “Database systems”, Addison- Wesley ISBN 0-201-34287-1,
1998
Fred. R. McFadden, Jeffrey A. Hoffer, Mary B. Prescott, “Modern Database Management”, Addison- Wesley,
ISBN 0-8053-6054-9, 1999
Norman H. Cohen, “Ada as a second language”, McGraw- Hill, ISBN 0-07-011607-5, 1996
“Ada 95 Reference Manual” , International Standard ANSI/ISO/IEC-8652:1995, Infometrics, Cambridge,
Mass., 1995.
http://www.inherit.se/sv/index_db.html
http://www.cetus-links.org/oo_data_bases.html
http://www.cpsc.ucalgary.ca/~kremer7courses/547/oodb/index.htm
http://www.adahome.com
Questions
1. What is the advantage and disadvantages of Object- oriented databases?
2. Why is Tasks in Ada perfect for building databases?
3. What is the goal of Normalisation?
33
Download