VIRTUAL INFORMATION IN THE B.S.E., (1978)

advertisement
VIRTUAL INFORMATION
IN THE
INFOPLEX DATABASE COMPUTER
by
LAWRENCE ABRAM KRAKAUER
B.S.E., Princeton University
(1978)
SUBMITTED IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE
DEGREE OF
MASTER OF SCIENCE
at the
MASSACHUSETTS
INSTITUTE (F TECHNOLOGY
June 1980
0
Massachusetts Institute of Technology 1980
Signature of Author
Sloan School of Management
May 21, 1980
Certified by
Stuart E. Madnick
Thesis Supervisor
Accepted by
Michael S. Scott Morton
Chairman, Department Committee
VIRTUAL INFORMATION
IN THE
INFOPLEX DATABASE COMPUTER
by
LAWRENCE ABRAM KRAKAUER
Submitted to the Sloan School of Management
on May 21, 1980 in partial fulfillment of the
requirements for the Degree of Master of Science in
Management
ABSTRACT
The concept of virtualness is extended to
information.
Virtual information is, informally speaking, information that
is
derived from other
information.
The term
virtual
information was proposed in
1973 in
a paper by Folinus,
Madnick, and Schutzman{R7}. We extend that paper by defining
virtual
information
in a precise manner.
Database concepts
related to virtual information (although typically not called
virtual information in the literature) are reviewed.
These
concepts
include data typing,
data
type
abstractions,
security, data independence,
and multiple views of data.
Finally, the function of virtual information in the INFOPLEX
database computer is discussed{R15}.
Thesis Supervisor: Dr.
Stuart E. Madnick
Title: Associate Professor of Management Science
Table of Contents
Title Page
Abstract
Table of Contents
Introduction
Chapter 1. Virtual Information
Chapter 2. Applications of
the Virtual Information Concept
Chapter 3. Virtual Information
Functions in INFOPLEX
37
Conclusion
48
Introduction
The concept of virtualness plays a prominent
the
world of computers.
virtual
card
readers,
virtual printers{R26}.
concept
to
In
information.
proposed in 1973
Schutzman{R7}.
in
a
We
this
virtual
extend
information
Finally, the
function
virtual
card punches, and
paper,
we
extend
the
The term virtual information was
paper
information
machines,
virtual
by
Folinus,
of
and
Database concepts related
(although
in
Madnick,
that paper by defining virtual
information in a precise manner.
to virtual
in
For example, the VM/CMS operating
system contains virtual memory, virtual
disks,
role
the
typically
literature)
virtual
not
are
called
reviewed.
information
in
the
INFOPLEX database computer is discussed{R15}.
In chapter 1, the concept of virtualness is
by
example.
Then a model of information is presented and
virtual information is
analogy
is
drawn
defined in terms of that model.
between
virtual
concepts of data type abstraction
Chapter
described
information
and
data
and
2 describes one application of virtual information
context.
Chapter
first
second
computer.
the
independence.
in a non-DBMS context, and several applications in
and
An
3
relates
chapters
to
the
the
ideas presented
INFOPLEX
data
a
DBMS
in the
base
Chapter 1.
Virtual Information
The Concept of Virtualness
Bruce sells watches with leather and metal
customer
bands.
A
asks Bruce for a brand X watch with a metal band.
Bruce has a brand X watch with a leather band, but he has a
spare metal band in a drawer.
Mary-Louise is
condition
that
invited
What is Bruce's reply?
to
John's
she brings dessert.
party
on
the
She has flour, sugar,
eggs, vanilla, baking soda, salt, chocolate chips, and
mother's recipe for chocolate chip cookies.
her
Can she attend
the party?
David is also invited to
ingredients
and
John's
party.
He
no recipe (although his mother is
cook), but he does have $10.
The bakery is
walk from David's front door.
a
is
effectively
yes,
minute
Can he attend the party?
the
posed
conditional on there being
sufficient time -to complete the obvious transformation.
say that Bruce has a virtual brand X
band.
Mary-Louise
and
David
no
a great
two
In each of the above stories the answer to
question
has
watch
with
a
have virtual cookies.
We
metal
For
example, as far as the customer is concerned, Bruce
X
brand
with
watch
a
The
strap.
metal
has
customer
a
is
concerned only with results, not implementation.
The advantages of utilizing the concept of virtualness
are obvious.
larger
For example,
inventory
would be stuck
stocked
chocolate
cookies
and
chip
to
keep
is
disadvantage
cookies
if
a
David
he
had
if John had requested raisin
other
On the
cookies instead of just dessert.
major
have
would
if he kept "actual" watches only.
with
actual
Bruce
hand,
the
the speed at which each person can
respond to a request.
What makes Mary-Louise's cookies virtual and
they become edible?
We will refer to
more desireable, edible, concrete, useable form of the
The less tasty form of
cookie as the materialization.
cookie
is
the
basis.
the
materializing
The recipe is the
cookies
course, the ingredients themselves
which
case
ingredients.
to as
the
they
would
be
the
the cookie's basis
Specifically,
comprises the ingredients.
for
do
This question implies that cookies are
more desireable in one form than another.
the
when
prescription
from the ingredients.
Of
be
in
might
composed
of
virtual,
even more basic
The most basic set of ingredients is referred
cookie's
primary
basis,
everything
else
is
derived.
but
Cookies become edible when they are materialized,
Suppose we kept
what made them virtual in the first place?
an
in
cookie
edible
yet
However,
view
of
materialized)
until
the
in
in
ant
outside
cookie
a
drawer
the
drawer
form
usable
opened.
is
point
of
The cookie outside the
the drawer.
drawer is derived from the cookie inside
the
kept
longer
no
a materialized cookie from the
there is
the
and
drawer,
The cookie is not
ingredients around.
(not
a
is
the
but
drawer,
not a virtual cookie
because the respective materializations are identical.
As another example, suppose that cookies are made from
ingredients, but
temporarily
are
before being consumed.
and
outside
stored
in
CI.
of the drawer respectively.
From
The ingredients,
previous
the
its
basis,
I.
respect
It seems unreasonable, however, to say
that CO is not virtual when its basis is.
virtual as well.
basis
examples, CO is not virtual with
respect to its basis, CI, while CI is virtual with
to
drawer
Let CI and.CO denote cookies inside
I, compose the primary basis for CO and CI, and
for
the
Therefore CO
is
A Model of Information
at,
An object t, denoted by
an
to
refer
called
strings
bit
of
be created, destroyed, and tested.
may
Atomics
atomics.
existence
the
We postulate
is
simply
name
the
The association, or binding, of an
atomic.
object to an atomic gives a semantic meaning to the
therefore
age"
is an object and 24 is
atomics.)
atomic
contains information., For example, "Mike's
and
descriptive
to
used
an atomic.
than
representations
Binding
"Mike's
information that Mike is
age"
24.
(We will
more
use
bit strings to describe
to
24
the
produces
An atomic that is bound to an
object t is called the materialization of Ot and is denoted
by
The binding process is called the materialization
tOt.
process.
The database is
the set of all objects.
database comprises q objects, {O 1,..,
is independent of
(TO1 ,...,
t},
0
If
is
TOi
tOj
not
materializations
We say that tOi
iff for every set of materializations
independent of TO
If
of
Toi
is
objects
the
binding
materialization of a
then toi is
independent
in
of
said to be
all
the database then
other
TOi
The materialization of a
said to be a ba'sic object.
is
}.
the
no change in TO3 results in a change in tO'.
derived from TOn.
object
0
Suppose
of
derived
basic
the object to an atomic.
object
is
the
is
process
The
of
mapping
the
materializations
of other objects onto a new
atomic and binding that atomic to the object.
For example,
consider the objects "Mike's birth year", "Mike's age", and
"current year".
birth
year"
The materialization of the object
"Mike's
is the process of mapping the materialization
of Mike's age and the
current
year
onto
an
atomic
and
binding that atomic to "Mike's birth year".
The materialization of a derived object is defined
terms
of
other
t
basis of 0
objects,
called the object's basis.
t
is denoted by Bt.
More
ordered set of objects, Bt={ 0 ,...,
of
the
basis
is
"Mike's age" is
The
[1,Su].
where p>0.
0 P}
is
an
The size
rt i.
by
For
example,
component of "Mike's birth year".
Bt,
tBt,
defined
to
be
The materialization of two bases,
TBu
and
identical
are
denoted
the first
of
materialization
{T01,...,top}.
tBV,
is
t
B
The
The ith member of Bt is called a
p=St.
component of Ot and
precisely,
in
iff
Su=Sv
is
truli=TrV'
and
The rule for materializing to
for i in
from tBt is
called
the materialization mapping, Mt
For 0
derived from
sense
to
i
t
to be a component of Ot, we require that to be
TO i .
The rationale is that
define
tot
in
terms
of
it
doesn't
objects
t
materializations are completely unrelated to to.
requirement
imposed
on
the
make
whose
Another
components of Ot is that the
A basis, B ,
components must form a ccmplete basis for 0 .
is
following
the
if
is complete
0t
If
true.
is
materialized at two different instances to produce t0tl and
then 10 t1=ot2 iff tBt =TBt2 . Hence h0t is said to be
t
completely determined by tBt. A necessary conditibn for
0 t2
completeness is that all objects that tot is derived from
t
are either in tB or are objects that tBt is derived from.
between
O
corresponds to
Each node, Nt,
directed edges.
A directed edge, Eu,v, leads from Nu to N
component
that
the convention
to
corresponds
components two
leftmost
the
the
first
through
St
component
follow
of
child
from
of
Nt
left
Therefore, Bt corresponds to the children of N.
the
graph
are
an
object,
iff
0
is a
When drawing object graphs, we will use
Ou.
of
nodes and
comprises
graph
object
An
objects.
relationships
the
show
to
used
An object graph is
not
edges leading from it.
that
and
to
Nt
right.
Cycles in
A leaf is a node with no
permitted.
A node, Nt,
basic, otherwise Nt is derived.
node
a
is a
leaf
iff
The primary basis of
Nt
is
t is
the set of all basic objects whose nodes are descendents of
Nt in the graph.
We define a cell in terms of its materialization.
The
materialization of a cell is a substring of an atomic.
If
we
can
decompose to
the
completely
such that each tCtri is
where p=S t,
(in
into a set of cells {Ctl ,...,CtPl
then we say that
sense used earlier) by trt,
An
non-cellular
object
is
in
shown
and
object
cellular
a
of
example
is cellular.
determined
figure
1.
iff tC ti-Trtri
to its basis, tBt (tOt=TBt),
each i in [1,St].
t
by definition,
each
(relations),
corresponding object graph, and the materialization of
object.
materializations
PERSON and YEAR
dashed
by
tables.
materializations,
tedious
process
(materialization
of
The
lines.
Some
nodes
instead
their
tables
are the
naming
labelled
are
with
of names, to avoid the
object.
each
The
of the) person's age is derived from (the
materialization of) her birth year and
the
to
connected
are
Objects
example
If
then
tOt/tBt
Figure 2 shows an instance of two tables
the
be
for
cellular
is not
If an object, 0t,
t
a
The
materialization of a cellular object, Ot, is defined to
identical
t
the
current
year.
current year were to change to 1981, then Joanne's
age would materialize as 16.
Note
cellular, except for '15' and '14'.
that
all
objects
are
OF
OBJecr Ak~c)
'P-s
A
CT,-LLVLA R.
flcff(FE
jac,-(O
Cf-(PONEMTS
IJA HE
i~ot
tuA Mg
I-tEIGHr (TtJCid3)
gq0
THC
HATekiALVlafTlJO\S
objeCT
MAME
Ar JD
OF
i-rS
A t\30
-
CFLLULAK
CoPJE~S
131l KT EAK
AGE
51'oU
cLL~ENT
IQgo.
VUL
13
Fjukre 2%
TMig
i^ e
PERW$~)
AMT
B8en-M veJr,
tI Jco.nne,
{,~.
YA&TAL
A66T
%
~.tro~in61966
LcZi1
6
i
I4ATGP,,i ALI EAaI\IjS OF
0 &J EC FS
Kin~
P6'~~~
eC Irzylok)id
We
are
is
Object Ot
a position to define virtual information.
in
now
virtual
statements is
iff
one
of
following
the
two
true.
(i) TOtfTBt
(ii) Trt8i is virtual for some i in [1 ,St
Thus, Ot is virtual
to
identical
the
if
its
materialization
object in its basis is virtual.
t
then
is
said
figure 2, the
respect
to
materialization
to
be
objects
their
of
is
not
its basis or if an
If statement (i)
is
true
virtual with respect to Bt.
'15'
and
bases.
'14'
are
virtual
In
with
Objects tl, t2, and PERSON are
virtual, but not with respect to their bases.
As a
corollary
we
information,
from a
virtual
to
the
that Ot is virtual
show
object,
0u
u
completely described by 0u).
a path must exist between N
path
from
Nt
to
Nu
edges leading from N
From
(although
If
Ptu, is
virtual.
if
it
virtual
is
not
derived
necessarily
from Ou then
in the object graph.
A
an ordered set of directed
to Nu, where
Also,
of
Ot is derived
and N
the definition of virtual
because OUis virtual.
definition
information,
Ot
rl,..Eik u
Pt=-Et
is
0 ik
virtual
is virtual
if
0 il
Therefore, by induction, Ot must be virtual.
is
15
For another example of virtual
the
PAY
relations
an
distributions within
Each tuple
themselves are
top
secret,
and
we
a
to study salary
want
salary
Since
organization.
confidential,
is
we
Suppose
object.
salary
information
3.
figure
in
VPAY
of PAY comprises a basic name object and
(Otl,0t2 ,etc.)
basic
and
consider
information,
executive
salaries
new
relation
define
a
In System R
consisting only of the salaries under $50,000.
{R6} VPAY could be defined as follows.
DEFINE VIEW VPAY AS:
SELECT SALARY
FROM PAY
WHERE SALARY < 50000
The
corresponding object graph and example materialization
are also in figure 3.
as 0PAY and M
The
The System R statements define BVPAY
in terms of the
salary
virtual object in figure 3 is 0
only
cell
.
of
tO
.
The reason
that 0 VPAY is virtual is that, although it is cellular,
VPAY34 PAY.
We can generalize this observation by noting
.
TO
t0
that
if
the information embodied in the materialization of
an object,
to , is a subset of the information embodied
tBt, then 0t is
a virtual object.
in
PAY
5
0 AP,-
ti
Chr-15
1)6
__
____
1110
6,456
L3Frarz
H
ALA fZY
II)5
___
tSCeo~
I..)
qO2
S5qcl
67,e c296
?pT-k+on
PAM
rxni
VAY
Ho5+PriXli za-iorL-s oA 0 b5 ects
( no+ calt Shown)
iaoco Harcio.
lchb-
ElI
F-orsj
I
(oru1cter..
GA.*
objerd
6&rek
Fcubre3
fi.)s9)j_
Virtual Information:
A Functional Perspective
Virtual information
from
objects
existing
enables
objects,
new
of
construction
make the new objects
to
appear as real as the existing objects.
This is
analagous
to data type abstraction in programming languages{R18,R20}.
type
Data
abstraction
types from existing data types,
appear
as
of new data
construction
enables
to make the new data
real as the existing data types.
types
We proceed to
devlop this analogy further.
Liskov and Zilles define an abstract data type
"a
class
of
characterized by
objects"{R18}.
abstract
objects
the
operations
which
is
available
to
be
completely
on
those
In other words, to define an abstract data
type one simply defines
(i) the operations that can manipulate the data
(ii)
(iii)
type,
the internal representation of the data type,
the implementation of each operation.
and
A programming
CLU{R20}
as
(such
capacity
language with an abstract data type
to a lesser extent,
and,
PASCALfR25})
real,
provides a set of primitive data types (for example,
boolean,
integer,
to enable definition of new data
etc.)
Abstract data types can also be used to define
types.
abstract
definition of an integer stack{R18}.
empty(the stack).
specified
of
Second, the internal
representation
is
to be an array of integers with a pointer to the
top of the stack.
terms
relevant
the
First,
the
push, pop, looktop, erasetop,
enumerated:
are
operations
consider
example,
an
As
types.
data
new
operation
each
Third,
is
defined
in
manipulation of the stack and the pointer to the
top of the stack.
Implementation details
defining
an
abstract
are
data type.
data type only the behavior --
data
is of importance.
type
know how
simplified
details.
manipulate
to
by
not
the
being
when
important
very
However, when using the
the functionality --
of
the
The programmer needs only to
data
type.
complicated
by
His
task
is
implementation
The result should be that programs will be easier
to write and maintain{Rl8}.
19
Virtual information
is
analagous
to
data
type
abstraction both in terms of implementation and in terms of
benefits.
type
The
only
abstraction
difference
concept
is that there is no data
analagous
to
non-virtual
information.
A virtual object is
completely characterized
by operations
on
(as
materialization
that
object
mapping).
it
is
useful
to
spectrum from pure data to pure
to
{R7},
"recorded
facts
information in the data base
data."
by
the
We therefore take a functional
perspective of virtual information.
perspective,
defined
To
view
investigate
this
information along a
algorithm{R7}.
According
which are independent of other
may
be
considered
as
pure
In terms of our definition, basic objects are pure
objects.
An example of pure algorithm is the trigonometric
sine function.
In between pure
lies
information.
derived
combination of pure data and
data
A
pure
and
derived
pure
algorithm
object
algorithm.
is
A
a
virtual
object is a derived object with the additional quality that
the
materialization of the object's basis not be identical
to the
employee
materialization
salary
of
information
the
is
object
itself.
represented
comprising tuples that comprise three fields:
name (EMPN),
(WSAL),
Although
to ASAL,
the
employee's
weekly
salary
by
Suppose
a table
the employee
in
and theemployee's annual salary in dollars
dollars
(ASAL).
a functional dependency exists between tOWSAL and
the materializations are independent.
The table is
20
shown
in
figure
4.FTy
re 4
A tJon - Vir voi Tdi6
A5AL-
KiSAL
EMP'
3mo
600
\ERTRUDE
toA
5SO
.2860O
L1.'J
EON60o
t B^A
Obj ed H ate, alia~
O b erk
1r
phk
a
The functional dependency is undesirable because
in
tOWSAL
not
will
be
automatically
corresponding change in
to
reflected
5
.SALFigure
change
by
how
shows
a
the
concept of virtual information can be applied to the tables
figure
in
4 to solve the dependency problem.
the basis of table A' (BA)
is
4, while tOA
to tO
The pure data
.
(WSAL)
to
give
virtual
information
Note that the functionality of 0A is identical to
(ASAL).
delete,
standard database operations such as insert,
;
update, and retrieve still apply assuming that M
A' -1
are suitably defined (for example, assuming
(M )
the
is
pure algorithm (multiply by 52), via the
materialization mapping,
0
figure
is different than BA in
identical
a
with
combined
In figure 5
update
course,
Of
WSAL).
MA'
be
always
not
may
that
ASAL by 52 before updating
divides
operation
and
Several example of this are given in
invertible.
{R16}.
Table A' in figure 5 is virtual with respect to
table
A'
were
and table A' can be manipulated exactly as if TO
A'
A'
B
and
is defined in terms of M
Once 0
identical to tO .
B,
BA',
can
0A
implementation.
function
is
manipulated
The
insulation
extremely
system design --
languages.
be
In
its
of
implementation
and
regard
important to database and database
just as it
fact,
to
without
this
support of data independence.
is
important
insulation
to
programming
is crucial to the
Fivwe 9'
A k/irf uoJ IcbfeeMPt4
W3,A L
A5A L
1~OA
m PI1
WJcSA L
LECI~360o
C-e KTR UDE
6so
to.
L3~IIA
LEZIIIJ
EFIZ
Obd -c- HoMieriaI --
~o
n,,s
cAbje-cA- nrafk.
5A
23
been
has
Data independence
various
in
defined
both precisely {R24} and imprecisely {R7,R15}
fashions,
to
For our purposes it is sufficient
the literature.
in
say
independence exists if a change in rBt does not
t
assuming that Mt is
necessarily imply a change in to,
data
that
For example, data independence exists
change.
to
allowed
in the relations of figure 5 if a new field,
be
changing to
without
0
to
added
contained in B ).
can
(note that BA' is
data
example,
As another
EMPAGE,
independence
at the applications programming level if changes to
exists
the physical storage structure do not force changes to
programs.
applications
the
refer
We
the
literature for discussions of
many
the
reader
to
the
virtues
of
data
independence {R13,R24}.
information
A virtual
independence
A.
comprises the 0 in
save storage space
0B
creates
Suppose the database
Noting that the ASAL can
figure 4.
by
shown
in
eliminating
figure
5.
the
ASAL
Without
field.
a
have
programs,
However,
to
change
for example,
with a virtual
to
tO.
He
virtual
information capability, since tBA has changed to tB A',
would
be
the database administrator decides to
WASL,
from
derived
following way.
the
in
data
facilitates
capability
toA
Therefore, applications
using toA would have to be modified.
information capability, a new table
A' (shown in figure 5) can be defined in terms of to .
We
such that toA (in figure 4) is identical
an M
find
must
Indeed,
5).
(in figure
to* to
they
in
identical
are
figures 4 and 5.
In practice,
find a suitable
always
to
is
impossible to guarantee that we can
materialization
is
it
because
More importantly, however, a change in
basis
a corresponding change in Mt to Mt,
such
and
B
that tot is identical to tot,
Mt.
mapping,
possible to change a basis in such a way as to lose
information.
Bt
it
may give us a non-invertible
Data independence will exist for retrieval operations,
but not for store operations.
storable
at
level
We say that an object 0
k if Mtk-i is invertible.
This problem
has been addressed in the computer literature {R16} and
implemented DBMS systems.
two
is
in
In implemented systems there are
possible reasons for non-storable virtual information:
(i) The system doesn't support some forms of
storable
virtual information.
(ii) The materialization mapping is not invertible-.
25
Applications of the Virtual Information Concept
Chapter 2.
A Non-DBMS Application of the Virtual Information Concept
Data Reorganization/Translation
The process of data translation is one of changing the
new hardware/software system, or possibly
a
on
processed
reorganize a database range from efficiency to
to
the
of
simplification
Figure 6 shows
new hardware/software).
necessity (e.g.
to
approach
Michigan {Rl,R2,R3}.
language
definition
source and target is
language
the
University
First, the source and target data
respect
with
are defined,
(DDL).
to
structure,
by
Second, a mapping between the
defined by
a
translation
definition
The Reader then reads the source database
(TDL).
form.
The
Translator
uses
form.
Finally,
the
Writer
common
the source data in the
common form and the TDL mapping to produce target
common
data
a
using the source DDL and translates the data into a
data
a
the problem taken by
members of the Data Translation Project at
of
for
Reasons
more effectively on the same system {Rl,R2}.
wanting
be
can
it
organization of the data in a database so that
data
in
uses the target data
definition to convert the target data to the correct target
format.
26
FR~ &wa6
A DbofcLa
TroasICLt"I C)n C eme~
Fisorhe
Do--cdcx Tra c Ict
"f~
i or) as on App r
o~Virfvoi
r-frmfdi'on
I
t~o
Ic~
The data translation problE!m
virtual
of
terms
figure 6 is
problem
redrawn in the conte xt
find
to
is
Thus,
structure at the target level.
mapping
that
DBMS,
DD
a
of
that
such
M
where
Consider figure 7,
information.
tOD
in
restated
easily
is
The
0D
has the desired
we are looking for
a
will give us our desired virtual view of the
source data.
DBMS Applications of the Virtual Information Concept
Multiple Views of Data
A view of an object is
the
Objects
object.
views of 0
object
iff 0
Ot
simply the
materialization
and Ou are said to be different
is a descendent of both objects
representation graph.
in
the
We discuss multiple views of
data because much of the work on
implemented DBMS systems --
of
views
--
especially
in
has focussed on multiple views.
Multiple views serve two major functions:
(i) The support of multiple data models,
the
hierarchical, network and relational models.
(ii) The support of multiple users' views,
its
specifically
own
each
with
requirements/authorization for a subset
of the total database.
the
and
system,
DBTG
the
R,
INFOPLEX, System
ANSI/X3/SPARC architecture all have been discussed in terms
the
efficiently
support
Our
data
and relational)
network,
a
but new
models
database
the
is
0D
that
Assume
model to be used to support
database
0n
hierarchical, network, and relational databases (0h
Or respectively).
0
In
then TOh,
the basis of 0h,
other words,
0
be
must, in each case,
virtual
if
the
If
0 , and
not
were
model
,and
ton, to r, and toD would all be identical
Without proof,
(since they have a common basis).
that
as
is to express the problem in terms of
purpose
with
to
enough
three standard models
the
only
not
virtual information.
implemented
multiple
flexible
be
should
models
level below
well.
R5
that the database representation at the
is
support
(hierarchical,
{R15,R6,R13,and
The difficulty in providing
respectively}.
model
support
model
data
multiple
of
we
claim
the materialization of different model views were
always identical then little would be gained by
supporting
multiple models.
The support of multiple users'
some
extent
by
terminology differs.
block
(PCB)
In
specifies
"physical" databases
sub-schema
if
most,
definition
the
is
DBMSs --
not all,
IMS,
the
views
program
handled
to
although the
communication
mapping between "logical" and
system,
the
In
the
DBTG
specifies
the
mapping between the
{R13}.
29
view
facility enables selection of a subset of
definition
.R6}.
System R is
a.
The
view
one
as
user
one or more relations to be presented to the
view
the
R,
System
In the
"sub-schema" and the schema (R13}.
support provided by IMS, DBTG, and
important for two reasons:
The user can view the
This
wants.
part
him
frees
database
the
of
worrying
from
sees
only
be changed
about
Since
irrelevant fields, sets, tuples, etc.
he
he
part of the database, other parts can
without
him
affecting
--
thereby
introducing data independence.
b.
Views can be defined
particular
and
to
permission
perform
operations on data can be selectively
Thus views
granted to users.
can
be
used
for
security purposes.
To see the connection between virtual information
the
of
support
multiple
users'
views, we can look at a
user's view as a view of a database with only a
the
potential information.
of the user's view is
and
subset
of
Obviously, the materialization
not identical to the
materialization
of the user's basis, since the user's basis is identical to
the
basis
of
the
database
subset of the database.
--
while the view is only a
Prevention of Errors
of
The prevention
securi-ty,
integrity,
errors
comprises
consistency,
four
areas:
and reliability.
These
areas are defined as follows {R21}.
Security:
prevent users from accessing and
modifying
data in unauthorized ways.
due
prevent semantic errors made by users
Integrity:
to carelessness or lack of knowledge.
Consistency:
prevent
interaction
of
semantic
multiple
errors
processes
due
to
operating
concurrently on shared data.
Reliability:
prevent errors due to malfunctioning
of
hardware/software.
Virtual information is important to two of
areas:
the
above
security and integrity.
Security
Security is related to virtual information because
it
necessarily implies different information content available
to
different
users.
provide security
multiple
view
In fact, System R, DBTG, and IMS all
facilities
facilities.
in
For
conjunction
example,
with
in
their
System
R,
granted to specified users to perform certain
permission is
operations on specified views (in System R a user's overall
smaller
of
views
Another way that virtual information can be used
objects).
to
of
combination
a
is
database view
For
provide security is through encryption techniques.
example, a view of an object could be defined as a function
of the view of the secure object
Without
insertion
and
key.
encryption
an
of the correct key, the object would be
garbled upon materialization.
Integrity
Virtual information supports integrity
data
type
(for example,
three
kilograms to tons),
related
type
concepts:
and scaling (for example,
Unit conversion and scaling can be
thousands to millions).
thought of as a subset of type conversion.
are
and other operations
meaningful
operations between
done
between
conversions.
unlike
can- be made.
objects
a
virtual
like
Second,
where
no
objects
it
by
rejects
meaningful
A discussion of the relationship
of data conversions to integrity can
is
Type conversion
First, it insures that comparison
aids integrity two ways.
Conversion
Data
(for example, real to integer), unit conversion
conversion
conversion
ways:
two
conversion and elimination of redundancy.
type conversion comprises
performing
in
be
information
found
in
function
{R21}.
in
the
32
following
sense.
represents
materialization
0u
component of
(that
For an example
the
number,
of
MIMS
(u/m).
a
data
the sole
is
as
integer
an
The
can
we
Using MIMS terminology,
have
user's
facility,
conversion
system {R23}.
an
view
associated
of
unit
that field is then
of
MIMS allows
To enable u/m conversions,
always in that u/m.
the user to create a file called the UM file.
in
whose
t
a virtual view of
each field in a record
measure
real
Ifu
if 0u can be manipulated
and real) then t0u is
consider
a
at,
object
conversion is defined between types integer
if
is,
that
Assume
Each
record
the UM file comprises a u/m code (feet, seconds, kg),
u/m type (distance, time, mass), a u/m
standard
indicator.
determination of
the
magnitudes.
internal
units
measures of a given u/m type are to be
shows
a field
an example UM file.
is
defined
representation
would be yards.
and
a
The u/m definition expression enables
relative
specifies
indicator
definition,
a
with
Thus,
a
u/m
The
in
stored.
standard
which
Figure
all
8
referring to figure 8, if
of
feet
the
internal
(the materialization of its sole component)
MIMS U/M File
u/m definition
u/m standard
u/m code
u/m type
inch
distance
1
yes
feet
distance
12
no
yard
distance
36
no
sec
time
1
yes
min
time
60
no
hour
time
60*min
no
feet/sec
velocity
feet/sec
no
inch/sec
velocity
inch/sec
yes
Figure 8
A data conversion facility requires some sort of
typing
facility.
One
proposal
for
such
a facility is
contained in {R22}.
That proposal is couched
domain' definitions
for relational attributes.
definition comprises a description of the
domain,
an
approach
might
integrity
be
representation details.
of
The domain
objects
in
the
of
the
domain.
A
These
{R22}) , in order to hide
issues
will
be
discussed
reference to INFOPLEX.
Redundancy of
(inserts,
deletes)
data
must
means
susceptible
that
multiple
updates
be done when one object is to be
updated (inserted, deleted).
Database
integrity
is
then
to processes that either neglect to modify all
occurences of
the
example,
to
can be
terms
to use the concept of abstract
data types (this is pointed out in
later in
in
ordering, and an action to be taken in case of
attempted violation of the
better
data
due
completed.
object
or
that
are
terminated
(for
a system crash) before all modifications
On
the
other
hand,
redundancy
facilitates processes that only retrieve the redundant data
- since it tends to reduce inter-relation (record, set, etc.)
references.
For example, consider the network
pictured in
figure 9.
Suppose we want to determine Kathy's school.
find
Person-School record via the P-PS set, since her
her
We
school name (sname) is located in the Person-School record.
We would have had to search in the P-PS set,
were
it
not
35
Rc iuaeA
Ec anpje Th-,cn RO-COr
pnctme-
Gcxrrnple. 5chcdI 'cr
cL-jeJ
t~nl~r~Q
~I~cj
Ga m pI- Percscni Sc hcJ I G-(-6d
jpn~ea
3flCjm1e-
GPA
aih 6a eii~n31q
io(Cat ton
36
the
for
redundancy in sname.
To eliminate the redundancy
while maintaining the benefits of redundancy, we could make
pname and sname virtual within
In
fact,
DBTG
this
provides
Person-School
the
record.
with its VIRTUAL
facility
SOURCE clause {R13}.
Data Encoding
Data encoding is typically used to reduce the
space required by objects.
of
special end-of-the-line
encoding
information,
would
object.
fixed length lines
text might be compressed via the substitution of a
the
Data
For example,
storage
be
clearly
character
falls
for
under
blanks.
trailing
the
hat
of virtual
since the materialization of a decoded
object
different from the materialization of an encoded
Virtual Information Functions in INFOPLEX
Chapter 3.
the
INFOPLEX
DBMS.
storage and
high
extremely
desire for and
comprises
a
a
and
retrieval,
speed,
storage
high
to
relevance
the
discuss
hierarchy
functional
of
reliability,
hierarchy for data
for
Our objective
performing information management functions.
is
in
INFOPLEX is the
for
motivation
The
phase.
design
currently
computer,
INFOPLEX {Rl5} is a database
virtual information to
INFOPLEX's functional hierarchy.
Three functions
in
the
INFOPLEX
computer
These are data
virtual information functions.
essentially
are
typing (including conversion, unit of measure, scaling
and
encoding), virtual view definition (including virtual views
of relations and attributes) , and security.
are
functions
their
related,
functional hierarchy cannot be
example,
the
same
level.
For
data encoding must be done at a low level so that
levels with
data
information
hierarchy is
functions
comparison
comparisons on decoded data.
virtual
in the INFOPLEX
placement
at
Although- these
depicted
do
the
Our proposal for placement of
functions
in
can
in the INFOPLEX functional
fi.gure
10.
Intuitively,
the
security, integrity, and data typing (except data encoding)
levels
belong
above
the
view
definition
level because
LC
VlkTUAL
TN-FCRJAATIO&i
EoNJc,TIONtJL
I
hleZ
ThM THO TAIFJOPLE-.
PI
i)L/QU~fZy
VI rta1V e0;I e+;cir, j
/
39
terms
virtual
of
discussion
greater
in
Placement will be
views (and objects).
discussed later in slightly
expressed
be
security, integrity, and data typing may
detail.
complete
A
the INFOPLEX functional hierarchy is beyond
of
The, reader is referred
the scope of this paper.
{R15}
to
and {R28}.
The Security Function
prevent
users
accessing
from
unauthorized ways.
previously,
stated
The security function, as
and
modifying
more
general
necessary to prevent data modification.
to
to
in
data
Although data access can be effectively
controlled through encryption, a
mechanism
is
the
assert
userid) already exists.
true
is
We assume that the
identity
We are then left
method
of
with
a user (the
two
major
design issues:
(i)
Location Strategies
1.
Keep
access
permission
information
for
information
for
objects in a user profile.
2.
Keep
access
permission
objects with the object itself.
40
to
(ii) Basis Strategies (not
with
confused
be
an
object's basis)
1.
Control access on a views basis.
2.
Control access on a query modification. basis.
Logically, the user that creates an object
should
be
solely
responsible for granting access permission to other
users.
This is the strategy used in System R {R6}.
user
A
If
to grant permission to user's B and C, and
wishes
location strategy 1 is used then A must access B's and
user
a
profiles--which
may
be
C's
located in a slow retrieval
on
(e.g.
level of the INFOPLEX storage hierarchy
tape).
If location strategy 2 is used then only the object profile
must be retrieved
because
accessed).
the
the
(which is
object
likely to be readily accessible
likely
is
Furthermore, location
granting
of
to
access
to
been
have
2
strategy
all
users
facilitates
(System R uses a
special keyword of PUBLIC for this purpose).
question
recently
Finally,
the
"what users have access to this object" is easier
to answer with location strategy 2.
The only advantage
of
location strategy 1 is that access rights can be determined
before an object is accessed.
strategy for INFOPLEX is
2.
Overall, the better location
Query modification is
really a form of view definition
in the sense that the modified query corresponds to
view
that
only
lasts for the duration of the query.
INGRES {R13} DBMS uses query modification as
security.
We
a
prefer
to
seperate
the
a
basis
newThe
for
view definitions
functions from the security function, and
therefore
favor
the use of view definitions as a basis for security (rather
than
something the security level does) in INFOPLEX.
is the approach taken, for example, by System R,
IMS,
This
and
DBTG.
The Virtual View Definition Function (VVDF)
The
VVD
function
is
to
enable
the construction of new
objects from objects already in the database.
specifying,
for
object
0t,
Bt and Mt.
earlier, most existing DBMSs have
some
This
means
As was mentioned
(perhaps
limited)
VVDF.
One of
provided
by
the
most
System
comprehensive
R
{R6}.
VVDF
facility
The System R user defines a
view preceeding a query with the header "DEFINE VIEW"
the
example
definition.
on
A catalog
When an object is
If
the
name
page
?).
The
holds
all
referred to,
is
(see
query itself is the view
the
view
definitions.
System R checks the catalog.
of the object is
corresponding definition
is
present in the'catalog,
substituted
for
the
the
object
referenced.
A System R view definition of
explicitly
defined
unless
updated
object cannot be
one-to-one
a
is
there
is
Mt)
A view-defined
query itself.
the
by
Bt
retrieval
The
mapping,
materialization
the
operation (i.e.,
defines
Ot
objects.
other
referencing
by
simply
object
between the object and an object defined in
correspondence
a base relation (System R's terminology for basic objects).
a
We propose
below
located
view
similar
level,
the
then
data
If it
typing
not be able to do type conversion for objects
would
Security cannot
defined by the VVD level.
if
INFOPLEX,
the data typing and security levels.
were above the data typing
level
for
facility
be
VVD;
below
it were then the security function could not be handled
via virtual views.
Suppose that a user wishes to
view
of
collection
a
INFOPLEX's n-ary level.
the
of
define
objects
a
already
hierarchical
defined
The user writes a query to
at
define
virtual (hierarchical) object in terms of the existing
n-ary objects.
operations
that
That
can
query
take
effectively
place
on
all
defines
the new object.
We
assume that the hierarchical external view level provides a
mechanism for mapping hierarchical
operations
into
n-ary
43 .
query
Since the
explicitly
of
retrieval
(for
mapping
the mechanism would be non-trivial).
(although
operations
such
modification operations
problem
the
update),
defines
relation
and
delete,
insert,
of inferring the latter exists.
general we can permit modification operations in a
n-ary
(for
but not (M)
atomics),
as
materialization
the
(object)
if
the
In
virtual
materializations of the
attributes of the virtual relation are invertible functions
of
of the materializations of the attributes
exactly
We can delete or insert a tuple only if
relation.
the constraint that the attributes of the virtual
include
we meet
relation
least one candidate attribute from each of the
at
if
we
mentioned above.
In
relations in the basis of the virtual relation, and
meet
the
one
invertibility
requirement
addition, modification is possible when deletes and inserts
are possible.
The Data Typing Function
Each object has its own materialization
- determines
the
operations that
reasonable
to
behavior
can
identical,
the
manipulate
object
the
that
by defining the
object.
It
is
expect that many objects would have similar
materialization mappings.
were
of
mapping
If the materialization
they could be catalogued,
mappings
and each object
could refer to the catalog for its materialization mapping.
We could ref3r
to
catalog
a
instead
Thus,
definition.
materialization mapping
for
with
include the usual types
(real,
complex
each
integer,
constraints,
composite data types
distinct
(e.g.
data
levels,
the
encoding (DE) level.
encoding
techniques
inplementation
a
new
object,
we
Types would
as
etc.)
could
well
network,
etc.).
extensive
require
as
type
would
size
constraints,
and
array of integers).
typing
can
be
divided
into
two
data typing (DT) level and the data
The
DE
for
minimizing
level
comprise
encoding/decoding techniques.
in {R28}.
type
facilities to enable parameterization of,
range
In INFOPLEX
object
an object type.
Such a data typing facility would
for example,
new
relation, hierarchy,
types (array,
parameterization
an
defining
of
each
object
associate
as
entry
is
responsible
storage
a
use.
catalog
for
Its
of
Further details can be found
The DT level is responsible for:
(i) insuring
differing
that
operations
between
objects
of
data types are rejected or that one or
both objects are properly converted
45
(ii) enforcing (i) with respect to different units
of
measure and scale factors
allowing user definition of the
(iii)
materialization
of objects at the next higher level
(iv) providing
whatever
necessary
deemed
suport
towards an abstract data typing facility.
The advantage
of
typing
materialization
operation
The
well-defined.
to
impossible
mapping
disadvantage
pre-specify
all
all
would
is
that
a
the
would
it
A
be
since the
type
data
facility, similar to those used in programming
abstraction
such
that
be explicit and
object types --
number of possible object types is infinite.
languages,
is
objects
would have to be built{R19,R29}.
facility
is
has
of
No abstract data type
non-trivial.
bases
The design
been
in
developed
the
facility for
data
literature.
We prefer restricting INFOPLEX data typing to
objects whose materializations
structure
(e.g.,
real,
have
integer,
a
relatively
string,
addition, a limited data abstraction
lines of {R22},.would be quite useful.
facility,
simple
etc.).
along
In
the
46
To perform (i)
requests
to (iv)
higher
from
parse
must
level
the DT
all
levels in order to identify each of
the objects references and each of the operations involved.
Suppose the following request were issued:
SELECT EMP.NAME,EMP.VACATION TIME
WHERE EMP.VACATION TIME<EMP.SICK TIME
catalog
DT
The EMP relation and an example
is
shown
in
The DT level would convert the request to:
figure 11.
SELECT EMP.NAMEEMP.VACATION TIME
WHERE REAL GREATER THAN(CONVERT INT TO REAL(
EMP.VACATIONTIME),EMP.VACATION TIME)
When
the response is
returned by the lower levels, DT must
modify the objects EMP.NAME and EMP.VACATION TIME
so
that
they are materialized in the prescribed format.
The above example illustrates two
implementation
considerations:
(i)
Lower levels must be capable of handling
arithmetic
complex
operations
comparison
and
the
generated by the DT level.
(ii) Virtual view definitions must be modified by
DT
level
before
definition (VVD) level.
level
were
above
to
going
the
Note
DT
objects could not be typed.
the
that
level
virtual
if
the
the
view
VVD
then. vir~tual
47
Th& EMP
The
R~~ci
C-o-±CL Icj
Obiech-On
Me CLS
-
e,
TieL mm
AIO
-
ies
-9 T ne
va-oi Cm
Nateri al ikc+C
L)re-,
Daseo-I
Itr
48
Conclusion
have
We
concepts.
database
to
attempted
information model and a corresponding notation
virtual
information
concepts
major fault of the notation is
retrieving
information
defined in
terms
of
can
the
an
which
with
concept
The
of
database in fairly well
materialization
the
provide
communicated.
that while
the
from
be
related
many
of
Virtual information is a collection
mapping,
the
concept of changing the information content of the database
is
poorly
defined.
A
more
desirable information model
would completely characterize the behavior
of
objects
in
terms of the operations that can be performed on them.
The implementation of virtual information in
was
discussed in terms of general issues.
INFOPLEX
Further details
could be given only if assumptions were made as to
details
of the interface between the virtual information levels and
other
levels.
Hopefully,
this
determination of those details.
paper
will
help in the
Bibliography
and FryJ.P.
Merten,A.G.
R1.
Approach
Language
,
Data
and
Ann
April 1974
Michigan,
Fry,J.P.
R2.
Data
Michigan,
Translation Project, The University of
Arbor,
Description
Translation",
File
to
"A
Translation Project,
Formulation
Data
Reorganization",
Data
of
Definition
a
"Towards
and Jervis,D.W.,
Michigan,
The University of
Ann
Arbor, Michigan, January 1974
Lum,V.,
Behymer,J.A.,
Data
Description and
Language",
Information
Frank,R.L.,
Schneiderman,B.,
"Stored-Data
A
Translation:
Systems,
Vol.
Model
and
2, pp.95-148,
1977
Pergamon Press,
Tsichritzis,D.
R4.
Taylor,R.W.,
SmithD.C.P.,
Fry,J.P.,
R3.
and Klug,A., "The ANSI/X3/SPARC DBMS
Database
Systems
and Tsichritzis,D., "Multiple View
Support
Framework", X3/SPARC Study Group on
(X3 Project 226)
Klug,A.
R5.
Within
the ANSI/SPARC Framework", Proceedings on Very
Large Data Bases,
Astraham,M.M.
R6.
to
et al, "System R Relational
Approach
Database Management", ACM Transactions on Database
Systems, Vol.
R7.
1977
1, No.
2, June 1976
Folinus,J.J., Madnick,S.E., Schutzman,H.B., "Virtual
Information in Data-Base Systems", Information Systems
Group,
Sloan School of Management,
M.I.T.,
Cambridge,
MA 02139
R8.
Each,M.J.,
System:
A
Coguen,N.F.,
Kaplan,M.M.,
Generalized
Approach
Conversion", Proceedings on
1979,
R9.
Very
"The
ADAPT
Towards
Data
Large
Data
Bases,
pp183-193
Navathe,S.B., Schkolick,M., "View Representation
Logical
Database
Design",
SIGMOD
in
Proceedings 1978,
ppl44-156
R10.
NavatheS.B.,
Databases:
FryJ.P.,
Three
"Restructuring
Levels
of
for
Large
Abstraction",
ACM
Transactions on Database Systems,
June 1976,
Rll.
Vol.
1,
No.
2,
Is
The
Use
Of
pp138-158
Brodie.M.L.,
Abstract
Schmidt,J.,
Data
Types
In
"What
Data
Bases?",
VLDB
'78,
pp140-141
R12.
Smith,J.M., "A Normal Form
Proceedings
R13.
For
Abstract
Syntax",
on Very Large Data Bases, 1978, pp156-162
Date,C.J., "An Introduction to
Database
Systems",
Addison-Wesley Publishing Co., Reading, Mass., 1975
R14.
CardenasA., "Data Base Management Systems",
Allyn
and Bacon, Inc., Boston, Mass., 1979
R15.
Madnick,S.E.,
Concepts
and
"The
Infoplex
Directions",
Database
Proceedings
Computer:
of
the IEEE
Computer Conference, pp168-176, February 26, 1979
R16.
Codd,E.F., "Recent Investigations
Into
Relational
IFIP Congress 1974
Data Base Systems", Proc.
R17.
Liskov,B.H., Zilles,S.N., "Specification Techniques
10, No.
for Data Abstractions", SIGPLAN Notices, Vol.
6,
R18.
pp72-87,
June 1975
Types",
Data
Notices,
SIGPLAN
Abstract
With
"Programming
Liskov,B., Zilles,S.,
9,
Vol.
4,
No.
pp50-59, April 1974
R19.
R20.
Liskov,B.,
"Abstraction
pp56 4 -5
76
SnyderA.,
Atkinson,R.,
Mechamisms
in
CACM,
CLU",
8,
Vol.
International Conf.
Data Bases, Sept.
1975
McLeod,D.J., "High Level
Conference on
Data
Base",
Data:
Base
on Very
Large
Definition
Domain
Proc.
Data
for
Subsystem
a
Integrity", Proc.
Relational
"Functional
Chamberlin,D.D.,
of
Specifications
R22.
Schaffert,C.,
, August 1977
Eswaran,K.P.,
R21.
2, 1976, pp58-59
8, No.
SIGPLAN Notices, Vol.
Bases",
Data
for
Hammer,M.M., "Data Abstractions
ACM
Abstraction,
in
a
SIGPLAN/SIGMOD
Definition,
and
Structure (March 1976)
R23.
Mitrol, Inc., "MIMS Request Reference Manual", June
1975
R24.
Stonebraker,M.R.,
Independence", Proc.
"A
Functional
View
of
Data
1974 ACM SIGMOD Workshop on Data
Description, Access and Control
R25.
Jensen,K.,
Wirth,N.,
"Pascal:
User
Manual
and
52
Report",
R26.
Springer-Verlag,
I.B.M., CMS Users
1979
Second
Guide,
Edition,
August
1977
R27.
Chen,P.P.S., "The Entity-Relationship
TODS, Vol.
R28.
Hsu,M.,
Functional
Machine",
Management,
R29.
et
1, No.
ACM
1, March 1976
"Architectural
Hierarchy
of
Unpublished
M.I.T.,
Model",
Specifications
the
Draft,
Infoplex
Sloan
of
the
Database
School
of
1980
"Data Abstractions For
al, ACM TODS, Vol.
Databases",
4, No.
Lockemann,P.C.
1, March 1979, pp60-75
Download