Facilitating Transformations in a Human Genome Project Database *

advertisement
Facilitating
in a Human
S. B. Davidson,
Dept.
Genome
and Information
University
of Pennsylvania
Email:
susan@cis.
Department
Science
Dept.
author:
B. Davidson,
Phone
(215)
interesting
lution,
Project
database
complex
challenges:
data
entry
the need to iutegrate
tems which
While
ical
range
these
are unusual
rapid
multiple
data
are not
and make
automat
these
Genome
Center
new approach
problems
database
developed
to a solution
and software
of models
intensity
and
ed solutions
The
22, and describe
by means
of a
acid)
beads
on
the
of
interesting
data
and
common
to
realm
and
perative.
This
either
paper
rapid
entry
range
These
exist
illustrates
over
make
are
techniques
to
problems
sources
variety
of
within
aid
in
their
in this
in this
paper,
in
the
of the major
schema
existing
problems
evolution
applications.
at
context
perimental
data
forcing
evolution
schema
and
to
Furthermore
being
po-
discussed
for
Chromosome
and
the
in HGP
resulting
22,
Children’s
and
better
laboratory
since
plan
and
data
integrity
guide
sion
dependencies).
plex,
non-standard
ongoing
enforced
This
gives
in order
for
the
rise
that
data
the
ex-
database
process
investigators
is
mod-
changing,
notebook
This
is crucial.
constraints
and
is constantly
applications.
to
experimental
developed
modeled
rapidly,
databases
need
must
consult
octhe
experimentation.
However,
is very complex,
hierarchically
organized,
an unusually
large number
of links among
423
or-
along
notebook
for
being
of the
related
extremely
database
and
the
kuown
database
laboratory
New
are constantly
Permission to copy without fee all or part of this material is
granted provided that the copies are not made or distributed for
direct commercial advantage, the ACM copyright notice and the
title of the publication and its date appear, and notice is given
that copying is by permission of the Association of Computing
Machinery. To copy otherwise, or to republish, requires a fee
and/or specific permission.
CIKM ’94- 11/94 Gaitherburg MD USA
0 1994 ACM 0-89791 -674-3/94/001 1..$3.50
en-
ax a less
az markers
faced
and
techniques
cur
the
involves
markers
Center
se-
Consequently
of Pennsylvania
domain.
*This research was supported
in part by the following grants:
NSF IR19004137, ARO DAAH0493G0,
NIH P50-HG-00425,
NSF
BIR9402292
and ARO DAAL03-89-CO031
PRIME.
on the
directly
of Philadelphia.
One
ify
G’s
chromosomes
The
diflike
400 bases),
time.
Mapping
is the
Genome
University
for
fragments
a
discover-
and
for sequencing
anchoring
Chr22DB,
of four
arranged
means
T’s,
at one
as landmarks.
Philadelphia
at the
DNA
the
DNA
and
C, T),
C’s,
goal.
the
of
(deoxyribonupairs
G,
practicaf
bases)
is to
is composed
(approximately
mapping
chromosome,
rapid
solu-
set
(HGP)
comprising
techniques
strings
of identifiable
Hospital
im-
are
are not
serve
(A,
of A’s,
there
DNA
has
to
combination,
solutions
to
prob-
transforma-
Project
of DNA
Sequencing
intermediate
located
the
broader
bases
(3 biUion
sitions
the
particularly
the
automated
or are inadequate
these
a wide
the
man-
data
their
HGP
dering
as within
where
the
Genome
chromosome
sequence
methods
genome
schema
constraint
data
of complementary
or
short
tire
ambitious
and
up
Although
a confluence
databases,
as well
databases,
Furthermore,
and
multiple
challenges
notebook
Project
complexity
do not
challenges:
to int egrate
laboratory
of biological
intensity
present
which
formats.
Genome
databases
data
need
systems
and
Human
complex
and the
software
models
tion
database
evolution,
agement,
Project
step
of these
chromosomes
molecule
a string.
exact
current
Introduction
made
nucleotides
ing
transformations
Human
Each
ferent
string.
Genome
Genome
a first
core
to express
24 distiuct
double-stranded
cleic
a
of the
the
quencing
Human
to be the
in which
genome.
long,
We
and constraints.
1
Philadelphia
describes
constraints.
goal
human
complexity
of the Philadelphia
database
perceive
a language
and
sequence
to biolog-
imperative.
for these problems,
at the
22, and
sys-
Chromosome
for expressing
89S-0587
Chromosome
what we
tions
and formats.
unique
for
lems:
and
en. upenn. edu
Fax (215)
solving
evo-
in the context
for Human
language
and data
of
management,
necessarily
combination,
a confluence
schema
sources
variety
their
illustrate
present
and constraint
over a wide
challenges
databases,
deductive
databases
PA 19104-6145
eckman~cbil.humg
S98-3490,
Center
Science
of Pennsylvania
Philadelphia,
Entail:
of the
Genome
and Information
University
edu
Susan
of Genetics
of Computer
rspenn. edu,
cis.upenrs.
*
B. Eckman
Abstract
Hmnan
Database
PA 19104-6389
kosky@saul.
Contact
Project
A. S. Kosky
of Computer
Philadelphia,
Transformations
to a number
need
the data
and contains
tables
(incluof com-
to be specified
to be correct.
Another
major
heterogeneous
is
frequently
notebook
as the
protein
ical
bibliographic
genome
laboratory
tems
perform
ject
and
This
different
haps
persist.
ple
queries,
it
is often
models
only
views
increase
in
of the
to
as a whole;
organize
dent
structuring
of
the
The
GenBank
family
is
“standard”
flat-file
variants,
National
veloped
at
ASN.
the
22 [7],
data
cept
from
Each
advantages
that
recent
been
vasive
o A
recent
report
of
[9] listed
that
were
current
data
sources,
among
was
issues
and
that
query
and
(b)
important
data
users,
a query
The
problem
ible
tools
research.
conand
dis-
query
others.
that
they
of
we
are
per-
Energy
In-
of simple
to
with
answer
the
sources
databases,
there
how
we
been
to analyze
a
to
and
to
as follows:
and
a data
trans-
encountered.
language
sample
current
have
and
than
desirable
is organized
transformation
the
used
large
evolution.
techniques
and
shows
We
fail
encountered,
Section
and
problem.
2
A Sample
Data
to
how
conclude
address
discussing
Transformation
The
data
and
notebook
and
difficult
know
little
to
in
for
start
modeled
the
the
the
future
in Chr22DB
archivaJ
HGP
understand,
to nothing
therefore
ing
schemas
databsses
about
what
some
complex
for
those
biology.
a bit
of the
laboratory
highly
molecular
off by explaining
and
and
are
especially
about
terms
who
We will
what
used
is be-
mean.
programs
into
and
information
fact
of
some
technique
for
of their
system
faced:
that
rapidly
HGP’s
View
intermediate
(fragments
and
(a)
the
the
the
attempt
listed
lack
DBMS
of
they
underlying
two
of the Biological
goal
of DNA)
locating
or
we will
consider
probes
and
an
were
them
schema
The
is the
that
an
or data
re-mapping
problem
of
trans-
is
is understandable
applications
evolution
databases.
by
program.
calls
Back-
into
the
We
need
string.
424
interest
is
a linear
ordering
Sequence
cut
the
to be able
of DNA
neighboring
overlap
using
cloned
randomly
into
are then
representing
from
A variety
manipulable
pieces
To discover
pieces
chromo-
positions.
mapping
These
come
ordering
human
(STS’S).
bases).
of two
pieces
Sites
rniUion
it is crucial
sequence
flex-
physicai
experimentally
string.
the
markers
to specific
lothe sake of simplicity,
of
DNA
fragments,
for
Tag
of
pieces
mapping
along
at known
one:
chromosome
(50,000-1
original
only
Sequence
overlapping
bled
this
form
language,
The
markers
some
evoking.
part
A Databaser’s
ground
the
are dis-
is no effective
2.1
of techniques
are used to anchor
cations
on the chromosome.
For
they
the
of schema
for
because
language
constantly
forming
de-
them.
a genome
adequate
using;
onr
is easy
is
of as a
rather
of Chr22DB
hss
this
forms
is highly
paper
that
to capture
arguing
that
a part
problem
it
right
for
is rather
database
of rapid
of this
2 illustrates
problems
a number
et al [10] in an appraisal
create
major
impossible
various
files,
combining
Goodman
to
in a form
reason
be thought
output
– a whole
accep-
are the
computational
whose
in light
remainder
it is used
The
Furthermore,
query
universal
they
can
the
(datalog
transforms
problems
that
and
complex
about
The
by
in which
simple
the
reason
gained
we believe
transformation
relation.
have
yet
and
languages
Chro-
among
the
indicate
single
Los
version,
to a biological
a Department
“summit”
structured
An
and
for
base
advantages
portability,
is one
a declar-
from
of representation,
underscore
to,
queries
for
●
and
it
very
formation
the
1 version
which
are
on
transformations
transformations.
a data
ushas
databases:
formatics
tributed
[8],
view
issues
papers
HGP
group
while
Section
syn-
developed
Center
that,
to data
it
that
is based
query
not
approach
iUustrate
transformations
declarative
have
our
and
approach
languages,
(as in
sources).
for specifying
extensions)
as query
3 describes
is
at the
ASN.
its
While
approach
data
trivial
one knowledge
entry
include
alluding
to
the
hese has its own
expressiveness,
Two
have
oft
there
developed
Phdadelphia
our
the
information.
point:
version
sim-
acceptable
indepen-
numerous
[6],
at least
a sequence
view.
language
with
by the
within
similar
in
a relational
and
developed
or
version
Laboratory
N CBI,
1 version
mosome
also
version
a relational
Alamos
the
same
and
Our
TSL,
fam-
models
is to describe
of data
(as
models
Genbank
data
multiple
model
data
as in the
transformations,
DB.
language,
structurally
per-
beyond
application
to achieve
Thus,
we find numerous
a case
and
data
complexity
necessary
the
increases,
partial,
in Chr22
ative
query,
within
to optimize
a specific
system
performance.
tactic
(Ob-
paper
problem
tance
dat abases
complexity
capture
diilerent
tasks
string
flat-relational
from
of th~
ing
a sample
and
trans-
data
different
multiple
of data
data
a single
between
screens,
to specifying
transformations:
in
or across
purpose
constraints.
databases.
and
As data
may
significantly
and
entry
integration
arisen
which
databases
in schemas
databases
analysis
search
include
personal-computer-bwed
to
of
computa-
complex-relationti
heterogeneity
is likely
Staden,
such
object-oriented
GemStone),
human
data
evolution),
data
The
assoftwaresys-
as pattern-matching,
(Sybase),
(ASN.1),
HGP
involving
biomed-
to
schemas
of dat abases),
the
Genbank,
the
schema
ily
number
and
with
approach
between
(ss with
queries
the
agroting
analysis
the
databases,
and
as well
databases
formations
answer
[1],
[4],
These
Store,
[2];
a principled
packages
of
database,
PIR
[3], FASTA
data
comparison.
databsses
to
Medline,
GDB
multiple,
contents
archival
base,
databases;
as BLAST
problems
and
base,
base,
complex
the
sequence
data
to
software
include
data
notebook
such
tional
acid
data
access
and
augment
These
sequence
map
that
databases
nucleic
the
as
to
by researchers.
such
is
databases
needed
laboratory
posed
problem
remote
their
order
relative
between
in the
ordering
to ascertain
overlaps,
size
reassem-
when
that
sites
in
the
two
pieces
is,
of
the
when
original
of
DNA
can be detected
the
a probe.
The
them,
desired
linear
and
related
and
interest,
and
visible
tion
lines
larger,
quence
contains
of a tiny
1.
become
used
to
size
range
and
time
cloned
be
the
se-
2.2
to
which
at
the
denote
markers
the
(probes).
of
pat-
The
func-
of granularity.
Horizontal
lines
the
sequence
data
-
‘---
to
are:
human
in
J-. I
.. . .
physical
Cloned
of
the
In cloning,
probes
and
In
are
what
and
and
STS’S
carrier
these
STS
It
consists
When
of the
the host
human
Sequence
Tag Sites (STS ‘s).
cells
DNA
DNA
merely
introduced
lations
rather
tributes
for
attribute
defined
cal reaction
pai~
used
as primers
amplification
stages,
each
An amplification
primer
sequences
the
demonstrates
items
a primer
amplification).
several
perature.
less the
within
called
(PCR
reaction
prises
by
intervals
about
test
sequence;
sequence
an
STS
The
proceeding
reaction
are found,
therefore,
its
name,
entire
multiple
2.
sub-
Since
be one
row
relation
rows
in
the
screen
must
be
Chr22DB
schemal
in
of
linkages
a precise
given
primary
rele-
3; this
between
is
re-
semantics
of
the
tables
and
at-
relational
are
the
Figure
below.
Uppercase
keys.
lMIE,
pnblicxmne
)
erwilid,
date-picked,
strand)
pr2-primer_id,
PCR-prod-size_hi)
(STSID,
.t ime,
AHPLHACEIIIE,
denat
-t smp,
AIiIIEALIEHP,
denat
4 tie,
. . .)
chain
reaction
comtem-
In
will not occur
unproperly
spaced,
with
this
database
nested
lational
reaction
Important
including
give
pickmethod,
ions
Aenat
Primers
the
LAB-CODE,
PCR-prod~izelo,
init
the
pickedfiromnaint
pri-primer-id,
three
(relational)
relevant
denote
Figure
Location).
is shown
schema
ing-temp,
PCRxxndit
a chemi-
at a different
a successful
this
used
will
(EER)
to
a form
in
with
data-entry
convey
of the
names
STS (ID,
of sequenced
to start
by the polymerase
containment.
are:
a pair
to
names (SATSRIAIJD,
primer
(ID, pname,
is an interval
the
may
map
relations.
underlying
than
in
melt
STS
at
of
the
must
there
in
and
conceptusJ
Some
produced,
rows
on
of
database
is shown
ions,
pair,
view
the
(STS)
Location
the
are captured
example
STS,
of Chr22DB
schema.
experiments.
An
A
portion
of
are cultured,
are
entered
database.
vant
two
and
to
An
involved,
application
relation
primer
ions
and
data
is
generator
by hand.
data
the
PCR.condit
trans-
applications
of the data
view
a single
STS relation,
Data
application
schemas.
illu-
of structural
since
and
case
a good
entry
the
notebook
enters
the
provides
a specialized
The
Though
is a special
data
provides
(Primers,
screen
transformed
we
or interval
vector
two
hand.
entry
to be done
of a complex
PCR.condit
in-
the
from
Importing
transformations,
by
complexity
applications,
lab
data
the complexity
that
center.
largely
and
has
spreadnotebook
as directly
at the
data
in structure,
enter
the
done
database.
denoting
2)
maintained
or
entry
between
actual
are
follows,
a fragment
into
cells.
replicas
acid
and
of sources:
preexisting
as well
consuming
handle
to
a variety
involves
been
of the
form
relations
probes
information
in future
of DNA
of the
derived;
laboratory
out
sources
modification
widely
each
mapping
1) Cloned
to be used
nucleic
were
GDB,
centers,
carried
time
underlying
of a Chromosome
in” freezers,
is inserted
or yeast
exact
(PCR
name
site.
from
as
Rewriting
can not
the
differ
..
comes
other
of some
In data
by Chr22DB.
Probes.
bacterial
temperature
Transformation
transformation,
enormously
a database.
some
DNA
melt-
expected
process
the
primers
of the
glamorous,
formations.
-. -.
in
(STS’S).
stored
stored
Cloned
many
used
Sites
probes
the
GDB;
object-oriented
date
a data
a screen
Mapping
Chr22DB
reagents
about
the
and
the
of the
to
such
particularly
L
-. . .
of probes
describe
stage
Database
these
has
and
—
-. -.
1: Physical
formation
which
being
from
and
in
briefly
primers;
product;
location
from
experiments
m
,..
,’
Tag
from
sequence,
the
each
in Chr22DB
databases
se-
is shown.
. ———
physical
for
databases,
tools
Sequence
name,
of
amplified
databases
sheet
whose
Below,
data
archival
1,<+
in
the
a cross-reference
probe
stration
types
the
A Sample
of
Two
of
required
chromosomal
not
represented
it;
of each
this
banding
fragments
sequence.
of DNA
DNA
top
themselves
level
DNA
marker
to
the
which
coarsest
overlapping
substring
Figure
named
temperature
conditions);
the
sequence
thought
At
with
a microscope,
as landmarks
Vertical
a lin-
then
relationship
Figure
is depicted
under
denote
yields
is contained
as regions
its
in
a chromosome
terns
be
ratory
ing
called
disease.
mapping
is illustrated
figure,
pieces
sequence
may
contain
fragment,
probes
such
to inheritable
Physical
quence
The
landmarks
of special
whose
versa.
sequences
shorter
on the
probes
vice
their
much
ordering
on the
map
areas
that
of a third,
ear ordering
in
by showing
sequence
lated
transformation
subrelations
schema
tables.
with
The
a complex
is flattened
value-baxed
atomic
into
re-
linking
re-
pointers
attributes
of
relation
a standard
the
top-level
data
the
1 The
labo-
[13].
425
schemas
in
this
paper
were
all
drawn
using
ERDRAW
JIJn
28
CHROMMOM2
1993
STS name
GDB
10cU,
used
here
Dex,ved
BELL
lab
K1-189
D22S119
DNA Seqnent
Y
Tech
PCR product
s,,.
,
Lab
254
low
(bp)
22 GENOWi CENTER STS DATA
from
single
clone
KI-189
COPY Probe
DUMANSK1
lab
K1-189
Y&C screen
BUDARF
254
hqh
Polymoqhlc
PHP.GE
vector
type
stat”,
Probe
N
IN
type
PROGR2SS
ANONYMOUS
Ccmwent ,
PRIMERS
Name
K1–189.
K1–189.
PcR
Imt
Cbr
22
,al.
t95
t~
120
CHRO!.K3SOMAL
Denature.
Anneal.
t=w
94
;~
:y=
relation
target
naint
erval,
composed
and
into
mat er ial,
the
End
Q1l
posltmn
two
by
lab,
STS. The
pos,txm
rows
in
erval,
the
two
target
Data
to accomplish
insert
normalized
data
name
fields
in
are mapped
names
table,
which
of the
object
being
maintain
relies
be
generated.
on
internal
to accomplish
data
dency
the
also
hold.
least
among
re-
data
the tardepen-
name
one
the
more
example,
GDB
at least
complex
each
(i.e.,
material
names.
non-GDB
constraints
lab-code
name
(i.e,
have
imply
certain
get
database.
=
“GDB”
names.
A Language
Constraints
for Database
The
proposed
believe
for
expressing
reasons
ify
and
data
tational
ming
reason
express
types,
a deductive
transformations.
transformations
about;
the
the
though
it does
and
of
finally,
is the
best
choice
There
are
several
should
language
structural
expressiveness
language;
approach
the
of the
code
We
will
start
then
giving
should
be
manipulation
not
need
a general
the
of
to have
the
purpose
language
able
to
compu-
program-
should
unify
426
level.
generator
that
rule
for
by
generated
previous
and
how
programs.
the
The
logical
stages:
each
for
rule
database
normalised
appropriate
inferences
rather
than
easy
are
many
the
are
to
adaptation
of database
of the
section.
they
genera-
in two
target
the
and
for
lan-
it is straightforward
a variety
syntax
two.
languages.
the
allowing
explaining
the
the
this
code
form,
level,
Further
of constraints
been
forms,
code
means
at the
for
al-
transforma-
work
database.
into
of the program,
in the
transformation
complex
data
core
examples
normal
be easy to mod-
once
at the
have
source
approach
only
described
the
converted
in
using
will
entry
and
between
a normal
tar-
transforma-
programming
generators
and
logic
database
nonrecursive
a complete
re-use
model,
and
is
Not
trans-
source
be expressed
to
from
then
the
interactions
converted
times
eral
data
for this:
and
easily
that
code
This
the
there
two.
a transformation
Horn-clause
about
of database
are
how
are
on
on
the
determining
be implemented
a variety
rules
that
We
can
programs
for
performed
)
and
unambiguous
tion
in
but
transformations
but
tors
a part
reasoning
since
between
constraints
constraints
can
constraints
databases,
is baeed
formal
only
DBMS.
lab~ode
Transformations
language
for
is generated
at
play
may
rules
integrity
of interaction
constraints
specifying
may
must
and
level
between
First
“GDB’).
3
do
guage,
system-
links
COmC.Sn’c,
formations
Not
The
transformed
For
one
and
integrity,
but
Buff=
1.5 Mgc12
Notebook
Our
ap-
to the integrity
constraints
of
Preeminent
are key and inclusion
constraints,
strand
RV
w
Screen
lows
tables.
To
. .. .
:%
Entry
only
is de-
transformations,
must
schema
identifiers
must
conform
get database.
#
the
statements
target
generated
F,nal
;p
LOCatlOn
BUDARE
tions,
propriate
lated
c!@e,
30
a significant
inserted.
In order
Date pinked
11/03/92
12/15/92
transformations
tables
interval,
GDBJ.ocus)
identifier
Ver, fLed
S0 BLOT
interval,
subrelation
The
and
internal
:7
2: STS
6 relational
na-int
sequence.
the
t~
12
Un,t,
BANDS
material,
Primers
tables:
(STSmame
separate
are linked
over
names,
5 target
screen
Ymethcd
LANDER
LANDER
LOCATION
start
Q1l
distributed
primer,
entry
to
are
schema:
Extend.
:7
Figure
screen
telq
CONDITIONS
PCR
Machine
PcR-9600
in the
Melting
56
55
Sequence
(5? t.
3, )
CACCATC2?ATGGTGCAG
GGGGAGACGTGATFIGAATTAA GCCC
FB
R2
systems.
underlying
language
transformation
data
entry
Finally,
used
in
data
with
sev-
clauses
application
we describe
implementing
Figure
3.1
Data
The
language
model
is based
allows
formations
is similar
nesting
of set
relational
resent
the
rently
the
a wide
to that
and
gaining
(not-null
data
data
[12].
can
be used
referencing
a row
relations.
The
values
can
not
also
data
The
for
allows
The
Our
language
scribe
than
us to rep
relation.
various
based
that
are cur-
ILOG,
the
model
allows
functions
to generate
functions
to create
can
applied
an entirely
new
as an object
in some
type
identifier,
particular
system
generated
object
be
variables
value,
from
for the language
by
two
distinct
els
established
cies,
support
certain
straints,
some
while
there
occur
in
of these
language
categories.
provides
of constraints,
object
important
other
of constraints
inclusion
databases
and
Rather
including
but
than
to express
but
not
that
language
to
of Datalog
with
database
non-recursive
our
Datalog
consider
atflat
or ILOG
that
though
for
many
the
with
in non-recursive
concept
([15]),
of
so that
be recursive
could
Datalog.
model,
a very
general
limited
the
any
data
of existing
nested
for
query
gree of rule
sically
above.
used
a database
concerned
427
is used
and
and
to describe
necessary
primarily
of this
with
the
work
and
the
a query.
manipulation
are
in
implement
involve
manipulations
satisfy
of IQL
established
may
opttilzation,
the
to
In
they
it
([1 2]) to
manipulations.
to express
though
rewriting
for example
of I LOG
data
constraints.
languages,
novel
of as an
or as a restriction
structural
contributions
language
some
be thought
languages:
model,
with
significant
transformations
tive
to those
deductive
relational
the
incorporate
basically
to be an extension
dealing
most
way
does
it could
be considered
The
fall
language
evolution
([11])
which
do not
the
features,
the
How-
making
in our
only,
than
we do not
syntactic
could
support
dependencies
primitive
a means
may
con-
identity.
databases
Though
mod-
dependen-
and
programs
of
dealing
concerned
and
to the
of Datalog
Iess than
a
a practice
with
limited
that
strictly
are
only;
power
than
When
is weaker
which
When
in
logic-
as predicates,
relations
expressive
we
rather
tuple
as Datrdog
used
values
when
stratified-negation.
de-
established
such
to base
but
or
clauses
various
data
functional
object-oriented
and
classes
incorporate
relational
other
of inheritance
many
any
models
dependencies
remain
such
mentioned
or
biological
into
family
keys
existence
concept
ever
our
data
as primitives:
the
be expressed
from
are
considered.
negation,
clauses
of constraints
may
case,
recursion
be confused.
Many
are being
transformation
that
functions
not
kinds
bound
a relation
of an entire
names
awkward
transformations
other
ensures
Skolem
are
in
indi-
variables
of a transformation,
languages,
relation
the
and
Individual
it differs
be seen to be greater
with
of
respect
to the
tuples
construction
query
becomes
without
which
or a-s a way
relation
can
a group
part
the
database
relational
identities
to
values,
of t uples).
conceptual
In this
which
access
or tuple,
(sets
in which
and
or optional
independent
of a relation
to simple
describing
in the
required
one
extension
found
In addition,
allows
relations
models
to be either
Language
be bound
entire
a repre-
STS’S.
components
can
arbitrary
for
3.2
tributes
Skolem
in order
the
and
Database
vidual
null).
of values
then
for
is a natural
but
data
trans-
models.
allows
structures
popularity.
We use Skolem
as in
of data
and
It
model,
of records
or
relational
implement
constructors,
object-oriented
attributes
and
range
identities.
complex
and
a nested
of [14],
tuple
of object
of the
around
us to describe
between
model
semantic
of Target
Model
which
sent ation
3: Schema
deducsome
rules
of data
Here
d-
are bafrom
we
of the
are
rules
themselves,
which
and
the
specified
with
in a clear
where
the
efficient
rules
can
them
and
and
database
3.2.1
converting
transformations
meaningful
then
from
constraints
a form
manner,
be easily
programming
in
and
are logically
into
translated
@
a form
into
P=Q
—
I
P#Q
–
inequality
I
P6Q
—
set-inclusion
I
I
—
Undef(a)
arithmetic
predicates
utaa;$~;~eoptionai
1
False
some
language.
P2Q
I P?Q
–
Types
Types
in
stract
our
language
are
given
by
the
following
ab
syntax:
Here
t ::r &
. . . . an :* t~)
I
irrt
(l$i~g
I...
—
set type
—
record
—
base
the
tuples
type
in
{t},represents
A set t ype,
t.A
tuple
type
(al
or
records
with
required
tl,
...,
either
:, for
attribute,
a required
Base
atomic
types
type
is a type
In a flat-relation
base
types.
as a whole
variabIes
type
to have
types.
for
and
attributes
the
type
language
source
a unique
of
...,
so a
. . . . an
t~
type
we consider
not
in any
An
attribute
term.
a tuple
all
relations
types
will
and
so on,
type.
is strongly
typed,
target
databases
can
be inferred
for
whose
such
be
type
each
type
of which
in that,
term
in
tuple
#
are built
of a tuple.
The
and
Atomic
example
main
ranged
syntactic
over
by
elements
of our
P, Q, . . . . and
language
atoms,
are
ranged
an
in the
and
atom
X
reiation
further
by
the
::=
Src
[
Tgt
following
abstract
and
by
P.a
I
f(R,
They
I
P(+I,..
:
target
—
constant
—
—
variable
P,f
., #k)
)
X
term
Id
same
field
value
as
= (equal-
Undef
the
of an optional
used
Fa Ise repre-
in
checking
~ STS would
mean
the
that
X
use a compound
on X:
= PI,
= P2)
is
attribute
pr2_primerLd
3.2.3
c Tgt.STS
a tuple
1,
in
the
target
prl.primerid
attribute
relation
attribute
P1
P2.
Clauses
has
the
form
database
database
$+ 41,...,477
The
attribute
atom
Not
—
Skolem
frsnction
–
compourad
all
A clause
term
type
# is called
form
41 ,...,42
— projection
. . . .
attribute
while
syntax:
source
~:
I
represents
and
STS. We could
restrictions
that
id
A clause
P
X
predicate
is
= I, prl-primer-id
means
STS with
terms,
over
represent
values
in a database,
4,*,
. . . . Terms
atoms
are the basic building
blocks
of formulae.
are defined
the
as
com-
Formulae
which
The
so
&
the
predicates
nullary
pr2-primerid
Terms
some
the
for the definedness
situation,
X(id
to P:
compound
E (set inclusion),
check
error
to put
and
has
the
atoms
is evaluated
represents
the binary
and
y of a transformation.
term
a trans-
using
validlt
For
has
41, ...,
w’ithin
of the
Id
is the
the
of
variable
equivalently,
an
is a t uple
it
relative
term)
&
P.a
. . . . #n)
one
STS,
term
sents
types
as well
is defined).
with
occur
if the
atom
the
y)
which
attribute
as
type,
X. Id.
(ineqndit
program.
3.2.2
then
P(q$l,
relation
an
or,
term
predicates
with
in
X,
tuples
(if it
carries
a, must
example
@k), then
of the
Atoms
of a transformation,
each
of the
it y),
would
given
form
compound
target
occurs
.,.,
For
schema
as STS, primer,
a tuple
Id
target
of base
and
tuple
P but
term,
For
regarded
a tuple,
of the
term
smaller
the
X(&,
that
in the
relations.
database
relation
and
or classes
of those
relations
erval,
databases
in
and
are
always
to sets
of the
term
(but
source
type.
representing
as the
P.s.
pound
:*
are
type is a tuple
are
are to be interpreted
41, . . . . ijn which
any attribute
term,
a, occurring
in
represent
:* tl,
tl,
value
the
which
be bound
a attribute
A compound
an optional
tuples,
can
of base
of the
same
one
represent
Constants
P is a term
value
of types
so on
{(al
types
for the
3, with
na-int
STS, primer
formation
form
the
of the
to the
be of an appropriate
Our
a set
be
A database
each
going
in Figure
sequence
:“P, for
to
relations,
a database
is shown
or
and
of the
as individual
and
example
. . . . an,
:* representing
string
If
ype
tuples
term
attributes
database
int,
is considered
t~)}
As well
al,
symbol
oft
represents
values.
relation
relation
sets of values
attributes
each
simple
with
finite
. . . . an :* tn)
tn respectively:
attribute.
A
:* tl,
Tgt
a transformation,
of relations.
while
types
contradiction
Src and
databases
as to values
for
equaiity
::=
T,,.
the
syntactically
is said
to
and
target
typed with
respect
the types
of terms
when
428
we take
the
the
head
bodg of the
correct
be
clauses
weli-formed
database
to Tar.
occurring
term
of the
Src
and
in
to
clause,
while
clause.
are
for
type
meaningful.
source
TtGt
if
database
it
is
well-
Ttgt,m~aning
the
have
clause
the
that
au
make sense
type
T,,.
and
Tgt
to
The
concept
([16]),
have
and
stricted
the
can
in this
paper
each
over
some
will
of
instantiation
such
of a clause
target
of database
denote
for
the
on the
which
values
are
tence
dependencies
for
In
determining
a
to
term
v,
Src to
+
which
which
get
vari-
For
generating
part
in Figure
3 from
the
head
of the
source
of
if it is true
example,
and
(id
the
= Y
+-
X(id
the
A
transformation
target
terms
any
following
of the
says
that,
for
any
two
STS,
if
X
and
Y
have
id
is a key
attributes
then
tribute
a constraint
rather
The
the
terms
which
target
constraint
Ttgt
re-
when
~ and
tion
Tgt
= P12,
PCR.prod-izeJo
to
is carried
values
in determining
of a transformation
will
now
for
= SIf)
((pname
inclusion
there
atof
may
= PN2)
E primers,
There
databaae.
only
tion
A
only
source
Primers
validity
play
some
more
in
that
for
body
The
as
examples
Figure
every
is a corresponding
3.
of
priner
entry
id
in the
in
for
~ Tgt.primer
this
the
descrip
This
and
of
each
al-
attribute
in the
is bethe
tuple
use of the
pname
two
in
the
in
target
data-
the primer_id’s.
turn
generated
by
primer
= PN,
pickmethod
= DP,
strand=
= PM,
ST)
G Tgt.primer
((pname
= PN,
prnethod
= P)
one
a difierent
are
= MT,
date-picked
+
Y(prl-primer-id
f -STS
Also
relation.
to lookup
relation
= f -primer(PN),
+
= P)
makes
in order
melting-temp
table:
X(id
clause
deserve
clause:
(id
an
of
only
atoms
valued
presence
primer
tuples
con-
Firstly
has
separate
set
that
function
STS relation.
STS~creen
is
of this
base relation
be counted
clause
Skolem
sub-relation.
The
of the
a significant
in two
the
the
the
relation
attribute
asserts
the
for
STS=creen
in
this
that
ids
it occurs
the
about
notice
generate
of a tuple
atoms
points
Firstly
another
shown
6 Tgt.primer,
< P12
are several
cause
target
program.
at
= P12)
prirnersj
a transforma-
the
and
terms
target
E Src.STS_ecreen,
id
the
and
= SH)
= PN2,
though
as source
= SL,
(pname
PI1
to
also
2:
6 Tgt.STS
E Primers,
G Tgt.primer,
is used
to ensure
shown
in Figure
= PI1)
of databases.
after
tested
shown
id
a pair
target
screen
= PN1,
comment.
containing
tar-
clause
schema
(pnsme
one database
contains
database
dependency,
STS table
the
of the
= PN1)
PCR..prod..sizhihi
only
transformations,
look
the
the
be
in order
part
words
is an example
constraint
Constraints
part
We
in
a clause
may
out
id
database,
head,
for
= SL,
PCR-prodsize-hi
relation
their
can be clsmified
source
the
1, on
other
concerns
the
a source
transformation.
In
clause
between
in
Constraints
straints
equal.
is then
while
Y in
value,
its
P12),
pr2-prime~id
= 1) E Tgt.ST’S
and
STS. This
connection
denote
X
same
which
values
terms
terms.
are
in a clause
denote
terms,
the
for
a clause
than
which
they
t uples
clause
in
is a transformation
entry
= P~I,
(pneme
Y(id
two
of clauses
U ndef atoms
STS relation
the data
= f-STS(P1l,
clause
= 1) c Tgt.STS,
class
only
PCR-prodsize.lo
X
between
in a special
contain
exis-
model.
clauses.
prl.-primerid
+
example
relational
not
and
truth
A pair
T8T~ and
not
could
functiomd
terms.
v.
For
does
constraints
transformation
contains
ia an
the
value
and
the
transformation
one
there
evaluated.
the
called
is
of these
traditional
we are interested
of the
a clause
denote
lan-
@
of types
satisfy
databases,
types.
Clearly
values
a
considered
in
it is being
p and
said
with
clause,
true.
two
the
The
relevant
variables
@ is also
is dependent
spectively,
the
welI-formed
remaining
that
databases
we take
a
for
last
using
is re-
of the
clauses
the
clause
together
the
that
be expressed
Datalog
values.
if, for some instantiation
@l ,...,
q$~ are true,
then
of the
clause
of
semantics
All
be well-formed
meaning
set
of the
in [17].
from
in the
finite
Note
range-restricted.
restrictions,
presentation
be found
is
variable
of these
41,. -., d~, is that
ables in the body,
the
it
is taken
that
detailed
guage,
The
and
range
definitions
more
Tt9t,
means
to
formal
type
of range-restriction
strand
E Tgt.STS
melting.temp
= PM,
= ST)
= MT,
date-picked=
DP,
G Primers)
E Src.STSscreen
Next
that
each
material
has
exactly
one
GDB
name:
We will
X.y+
clauses
X(staterialid
= M,
lab.code
=
“GDB”
)
= J4, lab.code
=
“GDB”
)
to source
E Tgt.nemes,
Y(materialid
see in Section
like
tions
this,
relations
in its
in one-pass
3.3 that
in order
head.
a clause
body
and
in its
Clauses
without
it is necessary
to get
of this
referring
only
form
to the
to unfold
that
can
target
to
refers
target
only
rela-
be processed
database.
E Tgt.names
And,
finally,
False
+
that
a public
(publicneme
name
=
cannot
“Yes”,
be a GDB
lab.code
=
3.2.4
name:
“GDB”
Transformation
A transformation
)
database
E Tgt .nemes
clauses
429
type
that
Programs
program,
Ttgt,
are well
consists
formed
from
database
type
Ts,.
of a set A of transformation
for
T STC and
Tt9t.
to
If
A
is such
database
type
iff,
a transformation
value
Ttgt,
for
of type
then
each
Ts,,
v is said
clause
to
u and
program
iff,
if
a A-transformation
there
exists
smallest
such
transformation
data
source
gram
by
imply
get
database
but
being
in the
program
what
certain
does
not
smallest
there
formation
be
carried
out
values
into
done
“one
in
the
the
database,
target
transformations
in
describe
which
which
the
target
database
is then
used
target
database.
The
problem
tional
for
program
model
the
flat
be found
more
relational
in
delicate
model
and
than
([18,
19])
a “selectdatabase
form
or
clauses
can
other
suitable
some
recursive
a tuple
be
2) to
by combining
in
some
our
of the
this
only
relation,
clause
the
STS
(Figure
would
process
it
of the
programs
will
termi-
description
follows
that
the
complete.
a normal-form
from
of a tuple
elements
a partial
then
is not
and
to form
transformation
that
program
3.2.3
description
terms
to build
Ch22DB
in order
a complete
it follows
for
in section
are built
Because
the transformation
in
(id
to
data
for
whether
nested
the
Datalog.
normal
clauses
database.
recursive
more
of testing
in our
then
or
source
for
the
STS table
data-entry
3) formed
screen
from
the
in
(Figclauses
be:
inserting
to
to create
If the
in nor-
a join-and-
calculus
in SQL.
provide
transformation
be
clauses
into
of a transformation,
If it is possible
ure
can
then
translated
relational
database
For example
is inserted
is recursive
is a little
and
which
target
are not
of
CPL
clauses
the
nate.
trans-
they
database
the
transformation
is,
as opposed
data
source
transformations
that
source
tar-
to compute.
in non-recursive
pass”:
by reading
in
unique
in
normal-form
clauses
pro-
about
is these
relational
language.
unfolding
data
ambiguity
into
The
A from
flat
directly
flat-relational
query
the
be
expression
converted
A-
be in the
It
is also
can
expression
is not
is a
additional
we wish
These
programs.
can
is no
interested
project
If a transformation
is.
that
there
smaliest
represents
other
transformation
are particularly
T,,c,
should
as well.
transformations
We
p of type
program
exclude
then
is said
a transformation
data
database
smallest
it
type
form
from-where”
to Ttgt
The
generzd
mal
of
of p
of p then
transformation
in
is complete
this
that
the
that
from
value
because
database
y is a
value
C.
Z’sr.
transformation.
database:
will
database
is important
generated
the
any
v satisfy
A from
to be complete
unique
for
and
v is a database
be a A-transformation
C 6 A,
A transformation
program,
and
= f-STS(P1l,
= P1l,
pr2.prirner_id
= P12,
PCR_prod~ize_lo
a
= SL,
PCR-prodsize-hi
rela-
+
problem
Details
~12),
prl-primeri.d
((pname
= SH)
E Tgt.STS
= PN1)
G primers,
= PN2)
E primers,
(pneme
can
[17].
PCR-prod~izeJo
= S.L,
PCR_prod~ize_hi
= SH)
C Src.STS-screen,
3.3
Normal
We
now
database
flat
Forms
limit
our
transformations
relational.
formation
turn
In
easily
Suppose
our
the
the
the
target
we first
a normal
into
special
case
database
convert
form,
can
in some
target
Notice
in
clause
contains
(non-
is said
a relation
to be in
normal
of the
R.
form
...
+
if
. . . , ak
lation
ak=Pk,
bl=Ql,
R,
and
are
. . ..bz=
bl, ,..
transformation
if all its
have
recursive
an
This
the calls
gives
does
body
to the primer
clause
a complete
and
in the
of the
relation
in
section
3.2.3
of the
Skolem
function
Transformation
is not
algorithm
for
[17]
if
the
program
complete
forms
transformation
the
reat-
description
not
call
which
have
on any
clause.
In par-
were
been
in the
replaced
f _primer.
Tools
transformation
code-generators
languages.
terfaces
con-
for
be
in
for
to
for
a flat
is
normal
fail,
central
programs.
a
form,
reporting
of our
If
In
relational
complete
part
base
non-
the
and
tools,
if
SYBASE
(the
most
and
several
other
of the
tool
such
code
take
much
users,
normal
tools
systems,
and
as
of
to
such
the
allows
load
them
are for
language
a code
has
in-
data-sources.
implementation
and
programs
of
further
to other
meta-data
and
are being
entering
to concentrate
from
and
indata-
be constructed
([13]),
of
of
biological
as SYBASE,
constraints
means
requirement
can
read
by
programming
immediate
is an
form
ER-draw
and
this
algorithm,
languages
addition
types
efforts
since
core
to convert
TSL
430
implementation
programming
an ersource
of database
([18]),
the
database
will
is automated
convert-to-normal-form
terfaces
given
process
a variety
CPL
Chr22DB)
the
normal
for
Initial
generator
form.
program
in
variables,
to
which
will
of the
optional
However
is said
program
type,
equivalent
program
generators
relations
of the
3.4
gil, . . . . q$n conand the terms
only
are in normal
algorithm
transformation
database
return
ror.
program
clauses
an
R; the atoms
and constants;
QZ are built
using
function
symbols.
pl, . . ..pk.
Q>,...,
stant
symbols
and
the
clause
STS relation,
by applications
Qt)6R
the required
attributes
of the
, bl are a subset
tributes
of the relation
tain
only
source
terms
target
this
in the
target
body
The
al,
We
< P12
41, . ..74n
where
form
= f-primer(PN2),
PI1
form
X(al=Pl,.
A
that
of a tuple
ticular
database
= f-primer(PNl),
P12
is
a trans-
which
a program
of
language.
A transformation
has
case,
into
be converted
query
to
where
this
program
recursive)
it
attention
PI1
easily.
various
schema-design
convert
it
developed.
constraints
on specifying
into
These
off
the
the
substantial
would
which
and
part
like
of
to build
a transformation.
graphical
automatically
generate
transformation
Ultimately
we
schema-manipulation
clauses
the
for
relevant
a schema
We
tools
try
constraints
evolution.
have
The
complexity
man
Genome
quency
of
data
Project
schema
incompatible
structures
databases,
evolutions
an archival
genomic
Our
experience
is that
heterogeneous
must
be exchanged,
tools
and
the
databases
necessitates
the
Knowing
is an extremely
Chr22DB
with
the
fre-
tical
number
of
with
which
dress
much
the problem
formations
Acknowledgements:
of new
all the
the
schema,
of these
although
subject
do
However
entered
of
not
The
ad-
due
available
these
limited
us to
and
data
for
manner,
and
of database
In
Genome
data
which
work
to
some
also
than
There
which
underlying
schema.
To
are
flat
from
languages
[2]
23,
our
knowledge,
need
[3]
for
already
future
indicated,
algorithms
such
for
to do this
[6]
some
of
of code
gento
many
transformation
edly.
The
entry
development
[7]
to
reflect
databases.
mation
as GDB,
the
We
programs
evolution
need
which
such
to
on
the
compose
will
of
others
import
data
which
not
every
want
time
Chromosome
the
updates
to
rewrite
there
involve
[8]
the
these
is a minor
22 database,
W.
tool,”
1 is also
and
data
J.
Gar-
database,”
Nu-
2231–2236,
base
repository,”
2237-2239,
W.
R. Pearson,
1991.
(GDB),
a hu-
Nucleic
Acids
1991.
W.
MiUer,
“Basic
E.
W.
My-
10CSJ alignment
of Molecular
Biology,
vol.
215,
1990.
“Rapid
with
Sci.
Gish,
Lipman,
Journal
403–410,
and
FASTP
U. S. A.,
National
sensitive
and
vol.
Center
sequence
com-
Proc.
Natl.
FASTA,”
85, pp.
for
National
Library
TREZ:
Sequences
M.
2444-2448,
“The
Biotechnology
of Medicine,
Users’
J. Cinkosky,
K.
Hart,
1990.
Information,
Bethesda,
MD,
EN-
1992.
Release
1.0.
Nelson,
and
Guide,
J. Fickett,
D.
restructuring
D.
of
[9]
archival
UPenn
G.
C.
GenBank,”
T.
G.
October
vol.
10,
no.
Technical
fields
C.
Overton,
for
3, 1994.
To
NCBI’S
in
the
appear.
See
Haas,
and
CBIL-9203.
Aaronson,
A system
and
G.
Applications
Report
J.
“QGB:
1994.
and
translator
Computer
Overton,
J.
for
features,”
querying
sequence
Computational
Biol-
To appear.
Department
Meeting
in or-
Searls,
relational
database,”
also
ogy,
B.
A
database
data-
routinely
of
Hunt,
19, pp.
genome
J.
pp.
J. Adams,
transfor-
other
vol.
19, pp.
D.
Biosciences,
repeat-
probably
from
are run
continuous
do
is
L.
sequence
mapping
Altschul,
ASN.1
of
transformation
be applied
these
programs;
programs
databases,
der
frequent
protein
vol.
“SORTEZ:
composing
transformations:
wiU be applied
only once,
programs
most
transformation
mation
of
Figure
ideas.
1987.
programs
issue is that
transformations
help
these
which
programs.
Another
while
some
George,
‘The
and
Marr,
completion
transformation
specifying
[5]
to
been
implementation
for
presenting
in
Peter
for their
to
the
interface
S. F.
Acad.
form
driven
D.
PIR
genome
have
databases
eventuzd
to
Searls
diagram
Research,
P. Pearson,
search
[4]
normal
the
Acids
ers,
relational;
and
indebted
David
and
postscript
“The
to
has
as the
target
are
and
Searls.
Barker,
parison
there
research,
W.
26]).
related
language.
of
cool
25,
schemas
constraint
areas
really
to David
Research,
estab-
are
proposed
many
as
prac-
of
merging
24,
underlying
databases
developing
man
Human
of schema
proposed
of interest,
a window
al-
means
using
approach
form
are not
the
in
cleic
for
a clear
from
is the
advice
and
alIows
arise
[22,
Overton
avelli,
transfor-
by
systematic
we have
erators
the
language
that
of
the
our
of normal
in
representable
(see
We
Chris
in [21],
language
systems
approaches
manipulation
no principled,
other
not
includes
of these
how
merged
proposed
that
databases
all
user
indicate
integration
our
important
References
for
languages.
heterogeneous
Central
addition
are
addressed
implement
of constraints
constraint
Related
then
entry
transforma-
transformations
a variety
specification
lished
Our
database
generators.
the
of database
model.
specify
formal
mation
code
context
also been
labori-
data
trans-
[1]
have
how
for
it is essen-
data
consequently
issues
in the
a more
lows
the
data.
previously
and
the
works
are necessary.
Some
in
on
the corresponding
underlying
tiaJ to have
tions
written
[20] ), existing
of performing
on the
current
been
(see
code
in the
gain.
Buneman,
has
evolution
this
struc-
indicated
first-hand
to
is ex-
data
development
methodologies.
Although
evolved,
GDB,
between
is clearly
en-
a trans-
approach
relationships
target
SYBASE
Hu-
of the
the
and
data
database,
the
program.
in
large
since
source
the
specified
ous it was to transform
and
schema
useful,
in the
involved
together
and
specified
partially
from
clauses
of the
have
Chr22DB.
tremely
Conclusions
completely
and
formation
tures
4
currently
transformation,
of Energy,
Report,
at gopher.
DOE
April
gdb.
Informatics
1993.
Summit
Available
via
gopher
org.
archival
[10]
transforhence
N.
Goodman,
ments
schema
the
Base
for
genome-mapping
Workshop
transformations.
Vancouver,
431
S. Rozen,
a deductive
on
October
L.
Stein,
language
database,”
Programming
BC,
and
query
with
1993.
“Requirein
the
Map-
in Proceedings
Logic
Databases,
of
[11]
S. Abiteboul
as
a
and
query
of ACM
Data,
P.
Kanellakis,
language
SIGMOD
“Object
primitive,”
Conference
(Portland,
Oregon),
on
pp.
[24]
identity
in
Proceedings
Management
159-173,
of
R.
Hull
and
M.
and
manipulation
creation
in
Proceedings
Very
Yoshikawa,
of 16th
Large
Data
“ILOG:
of
Bases,
pp.
object
E.
Szeto
and
graphical
editor
schemas.
reference
PUB–3084,
ley,
[14]
455-468,
Tech.
C.
Beeri,
‘On
manipulation
TechnicaJ
Unman,
Science
J. D.
Unman,
Press,
Rockvill,
MD
Complex
as IN-
MD
and
KnowL
20850:
Com-
1989.
Principles
Systems
and
available
of Database
Rockvill,
I.
edgebase
in
Theory
846.
Principle.
Systems
of lan-
on
of Database
II:
The
20850:
and
New
Know/-
Technologies.
Computer
Science
Press,
1989.
[17]
[18]
A.
S. Kosky,
and
from
kosky@saul.
language
cis
“Querying
proposal,”
from
646:
Proceedings
on
Database
1992
(J.
Technical
—
Record,
new
D.
tech.
tute,
February
A. Metro,
tiple
[23]
Sheth,
“A
tool
views,”
ence
October,
eds.),
140–154,
1992.
Available
as UPenn
and
R.
sys-
SIGMOD
December
sharing
1992.
Hull,
“Worldbase:
distributed
informa-
Sciences
Insti-
1990.
‘%uperviews:
Virtual
IEEE
vol.
SE-13,
J. Larson,
integration
on
Software
pp.
July
1987.
785-798,
J. Cornellio,
conceptual
in Proceedings
of dth
Engineering,
of mul-
Transactions
for integrating
on Data
pp.
in database
USC/Information
databases,”
A.
LNCS
Conference
Germany,
35–40,
to
rep.,
Engineering,
Wong,
in
bibliography,”
S. Wile,
approach
tion,n
L.
Hull,
evolution
annotated
21, pp.
S. Widjojo,
A
[22]
vol.
and
languages,”
MS-CIS-92-47.
“Schema
An
edu.
Berlin,
R.
a-
available
International
October
Report
J. F. Roddick,
tems
[21]
Buneman,
and
Springer-Verlag,
[20]
P.
query
A dissert
Manuscript
.upenn.
Theory,
Biskup
collections:
cis
of dth
available
edu.
1993.
embedded
transforma-
Manuscript
nested
Breazu-Tannen,
“Naturally
databaee
.upenn.
August
limsoonC!saul.
V.
for
1993.
constrains,”
L. Wong,
tion
[19]
“A
tions
and
International
pp.
S. Navethe,
schemas
176–183,
A.
vol.
LBL–
objects,”
Relations
Also
50–62,
Batini,
M.
J. Larson,
design,”
January
“Integrating
IEEE
Computer,
1986.
Lenzerini,
analysis
and
S.
Navathe,
of methodologies
323–364,
for
ACM
Computing
December
1986.
integration,”
18, pp.
Sheth
tems
A
Berke-
power
Workshop
1988.
Report
the
of complex
of Nested
C.
for
and
and
user
Confer-1988.
432
J. Larson,
managing
autonomous
Rep.
Laboratory,
of International
puter
4.o:
entity-relationship
Berkeley
(Darmstadt),
edgebase
[16]
extended
19, pp.
and
database
“A
database
Surveys,
1990.
“Erdraw
manual,”
Applications
Objects,
J. D.
for
and
for the
Proceedings
RIA
Markowitz,
vol.
vol.
on
1993.
S. Abiteboul
and
M.
Lawrence
California,
guages
[15]
V.
in
schema
identifiers,”
Conference
[26]
[13]
views
comparative
Declarative
International
R. Elmaeri,
user
1989.
[25]
[12]
S. Navathe,
22, pp.
databases,”
183–236,
“Federated
distributed
ACM
September
database
heterogeneous
Computing
1990.
sysand
Surveys,
Download