Object-relational database systems Yong Yao CS632 April 17, 2001

advertisement
Object-relational database systems
Yong Yao
CS632
April 17, 2001
Content




Introduction
Michael Stonebraker: Inclusion of New Types in
Relational Data Base Systems.
Michael Stonebraker, Greg Kemnitz: The Postgres
Next Generation Database Management System.
PREDATOR System Design Document: A detailed
description of internal design decisions and
implementation details.
Introduction

The relational model



Dominant mainstream approach
Work well on business data processing
But things have changed drastically,
nontraditional problems.
Taxonomy of DBMS Application
Yes
No
Use SQL
data Simple
complexity
Complex
Simple data
Yes
Relational
DBMS
No
Use SQL
File
System
data Simple
complexity
Complex
OODBMS
Yes
Relational
DBMS
No
•A programming
language with a
type system
File System OODBMS
Use SQL
data Simple
complexity
Complex
•Persistent object
ORDBMS
Yes
No
Relational
DBMS
ORDBMS
File System
OODBMS
Use SQL
data Simple
complexity
Complex
Features



Base type extension
Inheritance
Complex object
Inclusion of New Types in Relational
Database Systems
Michael Stonebraker
EECS Dept.
University of California, Berkeley
Example


Represent a set of boxes in the DBMS and a simple query- find
all the boxes that overlap the unit square(0,1,0,1).
RDBMS:
Create box(id=i4, x1=f8, x2=f8,y1=f8,y2=f8)
select *
from box
where ! (box.x2<=0 or box.x1>=1 or box,y2<=0 or
box.y1>=1)



Too hard to understand
Too complex to optimize
Too many clauses to check
Example (cont’d)
Create box(id=i4,desc=box)
select *
from box
where box.desc!!”0,1,0,1”


Box: new data type
“!!”: overlap operator with two operands of data type box,
return a boolean
Operators for Boxes
Motivation

Needs of business data processing
applications
Geographic Information System
 New Types: Point, Line, Polygon;
 New operators: distance, intersection ;
 New access methods: R-trees, KDB trees;
Conditions
1.
2.
3.
4.
The definition of user-defined data types
The definition of new operators for these
data types
The implementation of new access methods
for data types
Optimized query processing for commands
containing new data types and operators
Definition of New Types

Follow a registration process
define type-name length = value,
input = file-name,
output = file-name


Occupy a fixed amount of space
Conversion routines
Definition of New Operators
Safety loophole

Problem:



an ADT routine which has an error can overwrite
DBMS data structures
Unclear whether such errors are due to bugs in
the user routines or in the DBMS
Solutions:



Run in separate address space
Build a language processor
Provide two environments
New access methods



Users can add new access methods to
support user-defined data types
Goal: extensibility
Interface:



Conditions for the operators
Information on the data types of operators
Define set of operators
Example: B-tree

Conditions for operators:
Information on the data types of operators (B-tree)
•“<=“ is required
•Type: specific type, fixed, variable, fix-var, type1, type2
New set of operators (B-tree)
F1=(value - low-key) / (high-key – low-key)
F2=(high-key – value) / (high-key – low-key)
Implementing New Access Methods

A collection of procedure calls

open (relation-name)
close (descriptor)
get-next(descriptor, OPR, value, tuple-id)
insert (descriptor,tuple)
delete(descriptor, tuple-id)
replace(descriptor, tuple-id, new-tuple)

build(descriptor,keyname,OPR)





Hard problem- work with transaction management


log pages – simple, but may suffer from
performance penalty
log events

Event-oriented interface



REDO(T)
UNDO(T)
LOG(event-type, event-data)
Access path selection

Four pieces of information for optimization

Stups: estimate the expected number of records
…where rel-name.field-name OPR value

A second selectivity factor S:
…where relname-1.field-1 OPR relname-2.field-2


Feasibility of merge-sort
Feasibility of hash-join
Define operator token= AE,
……
Stups=1
S= min(N1.N2).
merge-sort with AL,
hash-join
Generate query processing plan

relname-1.field-1 OPR relname-2.field-2




Merge sort
Iterative substitution
Hash Join
Relname.field-name OPR value



any access method with field-name as a key
Secondary index
Sequential search
Summary




How to extend an abstract data type
How to define new access method
Integrate new access method with
transaction management
Generation of optimized query processing
plan
The POSTGRES Next-Generation
Database Management System
Michael Stonebraker
Greg Kemnitz
Three kinds of services for DBMS

Traditional data management


Object management


Simple data type
Complex data types, e.g. bitmaps, icons, text…
Knowledge management

Store and enforce a collection of rules
Example- newspaper layout




store and manipulate text and graphics, Bill
customers for advertisement.
Data: customer information
Object: text, pictures and icons
Rules: control newspaper layout
POSTGRES Data Model-Design criteria


Traditional relational DBMSs data model: a
collection of named relations
Orientation toward database access from a
query language


Interact with database by query languagePOSTQUEL
User defined functions
Design criteria

Orientation toward multilingual access

Two selections



One language tightly coupled to the system
Multilingual
POSTGRES is programming language
neutral
Design criteria

Small number of concepts


As few as possible
Four constructs:




Class
Inheritance
Type
Function
Data Model - Class


Class(constructed type or relation) : a named
collection of instances of objects.
Instances(record or tuple) : has the same
collection of attributes
create EMP(name=string, salary= float, age=int)
Data model - Inheritance

Inherit data elements from other classes
create EMP(name=string, salary= float, age=int)
create SALESMAN(quota=float) inherits EMP

Multiple inheritance – only create
objects without ambiguity
EMP
SALESMAN
Three kinds of classes



Real class: instances are stored in the database
Derived class :view
Version: store differential relative to its base class





Initially has all instances of the base class
Freely updated to diverge from the base class
Updates to the version do not affect the base class
Updates to the base class are reflected in the version
Supported by the rule system
Version Example
create version my-EMP from EMP
Two classes generated:
EMP-MINUS(deleted-OID)
EMP-PLUS(all-fields-in EMP, replaced-OID)



Retrieve: retrieve EMP-PLUS instead
Insert: All new instances for EMP or my-EMP will be
added into EMP-PLUS;
Delete: move records from EMP-PLUS to EMP-MINUS
Data model - Types

Base types:




Int, float, string
construct new base types.
Arrays of base types:
Composite types:

Construct complex objects: attributes contain
other instances as part or all of their value
Composite types

Contains zero or more instances of the same
class
create EMP (… , manager=EMP, coworkers=EMP)

Set: value is a collection of instances from all
classes
create EMP(… , hobbies=set)
{softball, skiing, skating…}
Data Model - Functions

C functions




Arbitrary C procedures
Can not be optimized by the POSTGRES
Argument: base types or composite types
Inherited down the class hierarchy


overpaid(EMP)
overpaid(SALESMAN)
Functions

Operators


Utilize index
Functions with one or two operands



{ALT, ALE, AE, AGT, AGE}
Allow new access methods
POSTQUEL functions

Any collection of commands in the POSTQUEL
Define function high-pay returns EMP as
Retrieve (EMP.all)
Where EMP.salary>50000
POSTQUEL





Set-oriented query language
Support nested queries
Transitive closure
Support for inheritance
Support for time travel

Allow a user to run historical query
retrieve (EMP.salary)
from EMP [T]
where EMP.name=“Sam”


Maintain two different physical collections of records
Vacuum cleaner: a daemon moves records
The Rules System

Requirements:







referential integrity
View management
Triggers
Integrity constraints
Protection
Version control
Design a general purpose rules system
Syntax of rules
ON event (TO) object WHERE
POSTQUEL-qualification
THEN DO [instead]
POSTQUEL-command(s)




on new EMP.salary where
EMP.name=“Fred”
then do replace
E (salary=new.salary)
from E in EMP
where E.name=“Joe”
Event: retrieve, replace, delete, new …
Object: name of a class or class.column
POSTQUEL-qualification: normal qualification
POSTQUEL-commands: a set of POSTQUEL commands
Forward and Backward chaining

Rules specify additional actions, and these actions
may activate other rules
on new EMP.salary where
EMP.name=“Fred”
then do replace
E(salary=new.salary)
from E in EMP
where E.name=“Joe”
on retrieve to EMP.salary where
EMP.name=“Joe”
then do instead retrieve
(EMP.salary)
where EMP.name=“Fred”
Implementation of Rules

Record level processing



Rules system is called when individual records are
accessed
Place a marker on the record
Query rewrite module


Perform poorly for a large number of small-scope
rules
Desirable when there are a small number of
larger-scope rules
When rules are activated

Immediate-same transaction
Immediate-different transaction
Deferred-same transaction
Deferred-different transaction

Currently only implements the first option



Storage System

‘no-overwrite’ storage manager




Old record remains in the database whenever an
update occurs
Instantaneous crash recovery
Support time travel
stable main memory required
Performance

Better than UCB-INGRES
But

Compare to the Cattell system, loses by about a
factor of two.
Comments


POSTGRES allows an application designer to
trade off performance for data independence
Imports only specific user functions into its
address space
PREDATOR
DESIGN AND INPLEMENTATION
PARVEEN SESHADRI
System overview



A client-server database system
Goal: Extensibility - adding the ability to
process new kinds of data
Basic components


Extensible table of Enhanced Abstract Data Type
(E-ADTs)
Extensible table of query processing engines(QPEs)
Server Architecture
Enhanced Data Types

Standard ADT




Specify the storage format for values
Specify methods that can be invoked on values
Specify how some methods can be matched using
indexes
Motivation: methods can be very expensive
Data.image.sharpen().clip()
Improvement
Data.image.sharpen().clip()



It is unnecessary for Sharpen to compress and write
its result to disk
-> Pass result directly in memory to Clip
Rewrite the expression
-> Data.image.clip().sharpen()
No need to retrieve the entire image
E-ADT



Expose the semantics of its methods
Query optimizations are performed using
these semantics
Find more at :
The Case for Enhanced Abstract Data Types.
VLDB 1997
Thank You
Download