Document 11082263

advertisement
Center for Information Systems Research
Massachusetts
Technology
f^anagement
Institute of
Alfred P. Sloan School of
50 Memorial Drive
Cambridge, Massachusetts, 02139
617 253-1000
THE USE OF REDUNDANCY
TO IMPROVE PERFORMANCE
IN AN INVERTED HIERARCHICAL DATA STRUCTURE
John J. Donovan *
Seymour N. Small **
June 1980
CISR No. 58
Sloan WP No. 1145-80
*
Massachusetts Institute of Technology
** Center for Birth Defects Information Services
Tufts-New England Medical Center
PAGE
2
CONTENTS
Introduction
3
Background Concepts in Hierarchical Data Bases
4
Improving Performance with Family Redundancy
7
Quantification of Savings Potential Using Family Redundancy
13
Improving Performance with Collapsing of Levels
15
Quantification of Savings Potential Using Collapsing of Levels
...
17
Case Study
18
Restructuring the BDIS Database
24
Quantification of Savings Potential for BDIS
27
Experimental Results
30
Other Issues to be Considered
34
Conclusion
35
References
36
qLjO%[p
I
PAGE
3
Introduction
This
paper
techniques
database
certain
operating
two
suggests
reducing
for
applications
that
use
the
costs
of
hierarchical
a
structure with inversion on low-level fields. These techniques are:
1)
introducing "family
redundancy," i.e.,
information
record
into a
that
introducing redundant
was obtained
directly,
or
through a summarization process, fran its "parent," "child," or
"sibling"
without
records,
altering the
structure
of
the
hierarchy, or eliminating fields from any records.
2)
incorporating one record level into another.
The paper
contains formulas which make
associated with each technique.
the
formulas
are used
to
and savings
A case study will be presented in which
predict
possible savings.
performance
improvements,
following
techniques,
are
These
reported.
explicit the costs
the
The
implementation
improvements corresponded
observed
of
these
to those
predicted by the formulas*
The case
For
this
study chosen is a
syston,
these
birth defects diagnostic
techniques
approximately eighty per cent.
reduced
the
assist system.
average
costs
by
.
.
PZ^GE 4
Background Concepts in Hierarchical Data Bases
The hierarchical model of data is
of applications*.
extensively used in a broad number
those characterized by a
For certain applications,
relatively stable database
with pre-defined queries or
operations,
have found the hierarchical system to be quite appropriate
A key
goal of
in general,
[4]
is to
reduce data
since redundancy wastes space and creates the potential for
redundancy,
inconsistency
significant
a systen designer,
we
in
the
However,
data.
improvement in
access
times,
redundancy
this
and
can
also
yield
savings must
be
In this paper,
two general
redundancy-based database design techniques are introduced.
In order to
weighed against the corresponding costs**.
illustrate
these
techniques,
consider
the
following
hypothetical
database:
Figure lA depicts a hierarchical structure of three record types:
*
STUDENT - containing ID number, name and address.
*
YEAR - containing year and the department
was registered during that year.
*
COURSE - containing course number and the grade received by the
student.
in which the student
Figure IB depicts one instance of that hierarchy in which the student
is Steven Melnick,
and his address is Eastgate.
He has attended school
*
is
management system,
IMS,
a hierarchical data base
having the second most number of installations in the world
** See Severance for a general model of file costs [9].
reported as
[71
PAGE
5
for two years. In 1978, he was in Department 6, and in 1979, he switched
to Department 15. He took two courses during 1978, and one in 1979.
This database organization is particularly well-suited for responding
to queries such as:
"Print all
an index on the name field,
grades for Melnick."
Melnick
's
Assuming there is
STUDENT record could be accessed
directly, and all of the YEAR and COURSE records read sequentially.
—
PAGE
STUDENT
ID NUMBER*
I
I
+
YEAR
COURSE
NAME
|
I
ADDRESS
I
I
6
PAGE
7
Improving Performance with Family Redundancy
Let us examine
difficult query:
(GPA)
the steps involved in answering
"List all students who
the following,
moi(>
received a grade point average
greater than 3.4 in 1979."
Assuming that the YEAR field is indexed, one access would be required
to retrieve
year;
the 1979 YEAR
record for
the first student
then one access would be required
underneath that YEAR record.
GPA for 1979 would
access would be made
record to get the
for each of the COURSE records
Using the grades retrieved,
then be computed.
If the average
to access the middle
of the hierarchy,
an
the parent STUDENT
These steps
for every student who was enrolled in 1979.
one student's
exceeded 3.4,
upward from the YEAR record to
name of the student.
enrolled that
would be repeated
Thus, it would be necessary
go downward,
if the query
and
condition was met, return to the root record.
In order to avoid
student,
the
accessing all of the 1979 COURSE
grade point average could
be computed and placed
YEAR record, as illustrated in Figures 2A and 2B.
maximum of
and the
two accesses per student
COURSE records
accesses have been
the middle of the
need never be
eliminated,
hierarchy,
records for each
Under this schone, a
are required to answer
accessed.
and it is now only
and if the query was
in the
Thus,
the query,
all downward
necessary to access
met,
return to the
root record.
However,
redundancy has been introduced,
resulting in extra storage
PAGE
and maintenance.
changed to an 'A',
be updated.
For example,
if Melnick's
8
grade in course 15.565 is
then both the COURSE record and the YEAR record
must
In the structure of Figure lA, on the other hand,
COURSE record need be updated.
only the
——
PAGE 9
STUDENT
I
ID NUMBER*
I
I
+
III
III
NAME
I
ADDRESS
I
I
+
YEAR
I
I
I
YEAR*
I
I
DEPT
I
I
I
GPA
|
I
H
COURSE
+
+
I
I
COURSE*
I
I
GRADE
I
Figure 2A:
I
Placing GPA in YEAR
-+
I
I
123-456
Steven Melnick
I
Eastgate
I
I
I
-+
-+
-+
+-
I
1978
3.5
I
I
-+
15.564
I
A
I
I
15.565
I
B
Figure 2B:
1979
I
15
I
^
1
I
+
+
Instance
15.571
I
A
4.0
PAGE 10
In the previous example,
information
accesses were reduced by bringing redundant
Accesses
upward.
can
also
be
redundant information within a single level.
"List all
query:
exceeding 3.4."
people who received an
If the structure
record.
However,
introducing
consider the
in 15.565 and
in Figure lA
is used,
have a GPA
would be
it
of computing the GPA for every
If the structure in Figure 2A is
in 15.565.
each 15.565 COURSE
accesses is reduced to two per
the number of
used,
'A'
by
For example,
'A'
necessary to perform the lengthy process
student who received an
reduced
to one per
the accesses can be reduced even further,
by repeating the year's GPA in each of the COURSE
15.565 COURSE record,
records for that year,
Although this
as depicted in Figures 3A and 3B.
seons inordinately redundant,
it may result in a more efficient system,
depending on the frequency of such queries.
These two examples
concept
of
"family
of database restructuring illustrate
redundancy,"
technique
the general
whereby,
"family"
information obtained fran related records (parents, siblings,
children,
etc)
a
is introduced into a record in order to improve the performance of
certain queries.
any query
Note that there is
that could
have been
answered at least as efficiently
"family redundancy"
accesses needed
no change in capabilities,
answered before
to respond to
restructuring can
any query
because "family redundancy" neither alters
may reduce)
or operation.
could
always
be
answered
by
the
This
number of
is true
the basic arrangement of the
record types, nor deletes any of the original fields,
query
be
In other words,
after restructuring.
will not increase {and
i.e.,
ignoring
implying that any
the
information and instead, following the original procedure.
newly-inserted
Hence,
the
PAGE 11
accesses,
information can
of family
introduction
and
as was
shown in the
never increase
previous examples,
the number
of
can sometimes
reduce them.
Note that two assumptions have been
first
is that
interface,
the database
where
each
is
made in the above example.
accessed using
database
movement
a high-level
is
pre-defined
The
language
by
the
programmer. This can be significantly different fron the query processor
situation where the exact sequence of
database operations is unknown to
the applications programmer.
The second assumption
That is to
say,
is that there is
the DBMS cannot
based on the GRAEE field.
no use of virtual
store in the GPA
data [5].
field an "equation"
PAGE 12
STUDENT
III
+
H
I
ID NUMBER*
I
NAME
I
ADDRESS
|
1
+-
YEAR
+
H
I
I
I
YEAR*
I
DEPT
I
I
GPA
I
I
+-
COURSE
+
+
COURSE*
I
GRADE
I
II
123-456
I
I
I
Repeating GPA in YEAR
Figure 3A:
I
GPA
I
I
I
I
I
I
I
II
Steven Melnick
I
Eastgate
I
+
+
I
I
II
+
I
1978
+
+
I
6
1
!
3.5
I
I
I
I
I
^
15.564
I
I
A
I
I
3.5
I
I
I
I
+
15.565
II
+
I
1979
15
I
I
I
I
4.0
I
I
I
1
I
-(.
Figure 3B:
3
I
I
3.5
I
1
I
I
+ +
Instance
15.571
III
I
A
I
4.0
I
+
PAGE 13
Quantification of Savings Potential Using Family Redundancy
There is a query speed
processing advantage and countervailing space
and maintenance costs associated with family redundancy:
1)
The savings due to faster queries can be canputed as follows:
= difference in cost per query i, before and after
Ri
database restructuring (function of CPU, I/O,
DBMS)
= average number of queries of type
Ni
i
during time
period t
Savings =
2)
RiNi
(1)
The costs of the extra space are:
b
= cost per byte per time period t
B
= number of extra bytes
Cost =
3)
51
($A>yte)
(b * B)
(2)
The maintenance costs associated with updating the extra records
are:
Uj
= cost of updating extra records for data update type
Vj
= number of data updates of type
Cost =
j
j
per tLme period t
?^U3Vj
(3)
Therefore, the overall savings per time period t are:
Savings(t) =
(
H RiNi)
- (b*B) -
(
i;
UjVj)
(4)
In a later section of this paper, Equation 4 shall be used to predict
PAGE 14
savings in an actual system.
PAGE 15
Improving Performance with Collapsing of Levels
If the structure
"Print
all the
of Figure lA was
grades for
every student,"
be the sum of
records accessed would
then the
total number
of
YEAR and
is collapsed into
then the number of accesses is
depicted in Figure 4A,
the query:
the number of STUDENT,
If YEAR
in the database.
COURSE records
mainly used to answer
COURSE as
only the sum of
the number of STUDENT and COURSE records.
This
example illustrates
the
concept
of "collapsing
of
levels,"
whereby, levels of the hierarchy are eliminated and the information that
is duplicated
they contained
redundancy,
collapsing
results,
and in
example,
the query,
fact,
of
in their
levels does
children.
not
Contrary
always
it can give significantly
give
to family
beneficial
worse results.
"What is Melnick's GPA in 1979?" results in having
to access all COURSE records in 1978, as well as those of 1979.
one access could be made to the 1979 YEAR record,
collapsing,
For
Before
followed
by one to each of its COURSE records.
Since the number
decision whether
of accesses saved per report can
or not to collapse
levels,
is not
be negative,
the
always clear-cut.
Generally, a level can be eliminated if that level has a small number of
bytes, and does not contain information that is often used for decision-
making in
the hierarchy.
year in the
YEAR record,
specific year of a
file)
,
then
If the
sample hierarchy contained
and if it
was rarely required to
student's file (as might be the
there might be a
only the
pick out a
case with a history
positive benefit in collapsing
the YEAR
PAGE 16
(The inclusion of the GPA is an entirely
record into the COURSE record.
separate matter.)
STUDENT
I
ID NUMBER*
NAME
I
ADDRESS
I
|
COURSE
I
COURSE*
GRADE
I
I
DEFT
I
I
I
I
I
I
YEAR
I
Figure 4A: Collapsing of levels
123-456
H
I
I
15.564
Steven Melnick
I
III
A
I
I
1978
I
Eastgate
+
I
6
I
j
I
I
+
I
1
15.565
III
B
I
!
1978
6
I
-+
I
I
15.571
III
I
A
I
1979
I
Figure 4B: Instance
I
15
I
I
-+
PAGE 17
Quantification of Savings Potential Using Collapsing of Levels
The
Equation
1)
function
savings
4
collapsing of
for
levels
is
identical
with the following exceptions:
The savings due to faster queries are the same as before, but
Ri can now be negative.
2)
Collapsing does not necessarily involve extra storage. The
nimber of extra bytes, B, is determined as follows:
Extra Storage
where
C
=
=
{
(C*D)
- [P(I>+D')]
- Q}
(5)
number of 'child' records
P
=
number of 'parent' records
D
=
number of data bytes per 'parent' record
D'
=
number of index bytes per 'parent' record
Q
=
fixed number of bytes required to maintain a level
to
PAGE 18
Case Study
Defects Information
the Birth
application,
research project of the March of
Tufts New
England Medical
various computer
System
(BDIS)
actual
an
in
joint
a
,
Dimes Birth Defects Foundation and the
Center,
services to aid
savings
for dramatic
potential
the
illustrate
To
will
be examined.
BDIS provides
physicians in treating
patients with
It is unplemented on an IBM/370 under VM/CMS and uses a
birth defects.
This case study centers on the
database management system called FXXUS.
Diagnostic Assist Facility,
which is still in the
one service of BDIS,
testing stage.
The Diagnostic Assist facilty works as follows:
1)
Physicians choose
retardation.
2)
condition,
patient's
their
retrieved
cleft
e.g.,
mental
palate,
(SELECTION PHASE)
defects which
Birth
that describe
those symptoms
from a report
the database
from
chosen symptoms
of the
exhibit any
for
are
(RETRIEVAL
consideration.
PHASE)
3)
A score is
based
on
computed for each of the
the
strength
of the
match
selected by the physician and those
the birth defect.
The birth
considered birth defects,
between
The
physician is
defects in order
answers are
defects are then ranked according
asked questions
re-score the
(SCORING PHASE)
relating to
to more carefully refine
used to
symptoms
stored in the database for
to the score, and the top ten are maintained.
4)
the
the top
the diagnosis.
birth defects.
birth
The
(CHALLENGE
PAGE 19
PHASE)
For example, if the physician were to enter the following symptoms:
disproportionate short stature
rhizomelia
cleft palate
the
respond
might
computer
with
the
following
list
of
potential
syndrcxnes:
Kneist Dysplasia
Hermann-Opitz Arthrogryposis Syndrome
Short Rib-Polydactyly Syndrome
The major storage
the symptoms
structure is a hierarchical
that are commonly found
(See Figure 5.)
database which stores
with each birth
Symptoms which are always seen in
birth defect are considered REQUIRED,
defect syndrome.
patients with that
others are called non-required or
choice symptoms. Closely-related symptans for a birth defect are grouped
into SETS. Associated with each set is a WEIGHT and a CHOICE COUNT.
The
WEIGHT is the number of points given to the syndrome if the criteria for
"satisfying" the set are met.
match on all of
a syndrome
normalized
a set,
the patient's case must
the REQUIRED symptoms in the set,
the choice symptoms,
for
To satisfy
is
by the
plus some nunaber of
as indicated by the CHOICE COUNT.
the
total
sum of
the
weights
possible sum
for
of the
that
The total score
satisfied
syndrome.
sets,
Figure
6
P^lGE 20
illustrates an instance of the database in table format.
NUMBER OF
OCCURRENCES
SYNDROME
+
+
I
I
I
NUMBER*
I
NAME
I
I
I
I
700
|
+
^
1
SET
III
I
I
I
SET~NUMBER
I
WEIGHT
i
I
2000
I
I
I
I
I
CHOICE"COUNT
+
H
I
SYMPTOM
I
I
+
+
I
1
I
I
I
SYMPTOM*
I
REQUIRED
I
I
I
+
+
Figure
5:
The BDIS Hierarchy
13000
.
PAGE 21
Number: 123
Name: XYZ
WEIGHT
CHOICE_COUNT
SET
1*
4
2
2**
REQUIRED
SYMPTOM
eye, downward slanting
eyeball, deepset
eyeball, atrophy of
eye, ocular hyper tol ism
immobile facial expression
mouth, small
lips, puckered
mouth, downturned corners
Figure 6:
Instance of the BDIS Hierarchy
* In Set 1 of the illustrated syndrome, two of the four symptoms must be
matched in order to receive four points. One symptom does not result in
any points, while three or four matches still give only four points.
** In Set 2,
the patient must have an immobile facial expression, and
one of the other three symptoms to receive three points.
If the first
symptom is not matched,
then no points will be awarded (even if all the
other symptoms are matched)
PAGE 22
The greatest usage of computer time in
the 3DIS system is during the
RETRIEVAL PHASE, when the birth defect information is retrieved from the
database.
explanation of the database accesses
Following is a detailed
that are required:
1)
Get the first symptom from the physician's list.
2)
Do an indexed retrieval to the first occurrence of that SYMPTOM
record in the database.
3)
the SYNDROME record to
Go up to
find the number of
the birth
defect containing that symptom.
4)
Get the
next occurrence
symptom may
be present
of that
SYMPTOM record.
in multiple
(The same
birth defects.)
If more
exist, return to Step 3.
5)
Repeat Steps
6)
For
each
2
to 4 for every symptom in the physician's list.
of the
syndromes
eliminating duplicates),
that
get all
has been
of the
retrieved
(after
associated SET
and
SYMPTOM records.
The last step is required for the following three reasons:
1)
the required symptoms for a set have
It is not known if all of
been matched, until all of the SYMPTOM records in that set have
been retrieved.
2)
The total possible score
(sure
defect cannot be computed,
of
the set weights)
until all
for a birth
of the SET records for a
birth defect are retrieved.
3)
Though
the
SYMPTOM
selected are not
records
for sets
immediately needed,
retrieve all of the SET and
which
it is
have
not
been
more efficient to
SYMPTOM records in one pass,
than
PAGE 23
to selectively retrieve records.
FOCUS,
This is
due to a facility in
whereby multiple records can be retrieved with a single
database
call.
(The
effects of
this
facility on
database
access costs will be explained in a later section.)
Thus,
in order to score the birth defects,
the database for each of the
addition,
for
each of
the program had to go up
occurrences of the selected symptoms.
the potential SYNDROMES,
it was
retrieve all of the associated SET and SYMPTOM records.
In
necessary to
PAGE 24
Restructuring the BDIS Database
improve
To
the database
efficiency,
restructured
was
with
the
(REQUIRED COUNT)
was
following changes.
1)
The number of required symptoms in
SET record.
into the
inserted
a set
This eliminated
retrieve all of the SYMPTOM records
the need
to
in a set just to determine
if the required symptoms had been matched.
2)
(TOTAL WEIGHT)
The total weight of the sets
the SYNDROME
retrieve
Now
record.
of the
all
SET
it was
records
was inserted into
longer
no
necessary
to determine
the
to
maximum
possible score.
3)
The SET records were incorporated into the SYMPTOM records. This
meant that
the path up to
the SYNDROME record for
an indexed
symptom was shorter, and that the total number of records under
a root SYNDROME record was less.
The first two changes are
examples of "family redundancy" techniques
that result in the database depicted in Figure 7A.
The last change is a
collapsing of levels, which gives the database of Figure 7B.
With the database of Figure 7B,
canputed by
simply goire?
selected symptoms.
retrieve all of
up the
We can
a score for each birth defect can be
database for
then select
their symptoms for use
each occurrence
only the
of the
top candidates
in the CHALLENGE PHASE.
and
As a
result, the following reductions in the number of required accesses have
been made:
P^GK 25
1)
The number
of accesses required
to go
up to a
root SYNDROME
record is two instead of three.
2)
The number of
trips down the hierarchy is limited
top birth defects)
.
Formerly,
every considered
to ton (the
birth defect
required a downward trip.
3)
Less records are retrieved on each of these downward trips,
as
the SET records have been collapsed into the SYMPTOM records.
Thus, compared to the original syston, a significant reduction in the
number of accesses has been made.
PAGE 26
SYNDROME
+
I
I
NUMBER*
I
NME
I
|
I
TOTAL WEIGHT
~
I
I
I
+
SET
SET NUMBER
I
WEIGHT
CHOICE COUNT
REQUIRED COUNT
I
SYMPTOM
-+
I
I
REQUIRED
SYMPTOM*
I
+
Figure 7A: BDIS database after Family Redundancy
SYNDROME
III
I
NUMBER*
NAME
I
III
III
|
I
TOTAL WEIGHT
|
SYMPTOM
I
SET_NUMBER
|
WEIGHT
I
I
CHOICE COUNT
|
REQUIRED COUNT
SYMPTOM*
I
Figure 7B: BDIS database after Collapsing of Levels
I
I
REQUIRED
I
I
PAGE 27
Quantification of Savings Potential for BDIS
Using Equation
changing the
cost for
system
(K)
equation is
following savings
the
,
and computer
a further manpower
and subtracting
4,
obtained:
Savings(t) =
iRiNi) -
(
-
(b*B)
(lUjVj) - K
(6)
'
i
For this case:
is a minimal
b*B
The
cost.
increase
in storage
was
under 250K bytes.
{
the data base
is small because
T. UjVj)
J
(In test runs,
few times per year.
significant
no
is updated
difference
in
only a
there was
cost
the
of
updating the old and new databases.)
is 1,
i
only one
i.e.,
type of
query (diagnostic
assist) exists.
averaged over several years
is small when
K
and in this particular case,
thus,
did
the restructuring
overhaul of the system,
coincided with a major
and
of use,
amount
not
to an
additional
expense.
Therefore, equation
Savings
R,
(t)
which is
(6)
reduces to:
= R * N
(7)
the difference
in cost
of a
query,
is
exceedingly
difficult to conpute theoretically. Firstly, FOCUS maintains its data in
4000 byte pages, and therefore, an access may or may not imply an actual
disk
I/O.
Secondly,
the
cost of
retrieving
information from
the
PAGE 28
database consists
relatively fixed
charge of
making a
database
plus a variable cost depending on the number of records accessed.
call,
Lastly,
many
of a
the
single call/multiple record
records can
be retrieved
retrieved independently.
facility of FOCUS
smaller
at a
cost than
means that
if they
were
Despite all of these problems, and for lack of
a better choice, the theoretical estimate of the number of accesses made
was used to attenpt to predict system costs.
Let us therefore,
compute Ao,
number of accesses under the old
the
systan, and Ar, the number of accesses under the restructured system.
From the
database it was determined
the average three sets and 19 symptoms.
syndranes in
which a particular symptom
cases were examined,
and each contained
contains on
that each syndrome
In addition, the mean number of
exists is fifteen.
152 test
an average of eleven symptoms,
and retrieved a mean of 210 candidate birth defects.
With this information, Ao can be computed as follows:
Ao
Accesses up
= (Accesses up) + (Accesses down)
= #Symptoms/Case
*
= 495
Accesses down
= ICarri idates
= 4830
#Syndromes/Symptom
*
taccesses to go up
(11 * 15 * 3)
*
#accesses to get all symptoms
210 * (1 +
3
+ 19)
Therefore, under the original system, the number of accesses required
(Ao)
is 5325.
PAGE 29
Ar can be
computed in a similar manner with
the following important
exceptions:
1)
The number
of accesses required
to go
up to a
root SYNDROME
record is two instead of three.
2)
The number
of trips down the
birth defects.
hierarchy is limited to
to scoring ties,
(Due
the
the top
average number of
trips is twelve rather than ten.)
3)
The number of accesses required to
get all of the symptoms for
a birth defect is 20 instead of 23.
Accesses up = 330
(11 * 15 * 2)
Accesses down = 240
(12 * 20)
The total number of accesses under the new system
Implies
that the
new system
will use
accesses as the original systen.
(Ar)
about one-ninth
is 570.
This
the number
of
PAGE 30
Experimental Results
In this section, the observed results when the system was implemented
The experiment
are reported.
original and
compared I/O
restructured systans for ten
and CPU
usage under
The
test cases.
the
cases had
fron one to twenty-one symptoms.
The experimental data is summarized in Figures
8
and 9.
In Figure 8,
the nun±)er of I/O requests is illustrated and in Figure 9, the CPU usage
The
is shown.
resource usage for
data was obtained by monitoring the
the retrieval phase
of the program and subtracting the
operations, i.e., opening databases,
loading modules,
usage for basic
etc.
Note that
since a FOCUS access does not imply a disk I/O, then the absolute values
of
the
experimental
results
As expected,
relatively
the I/O usage
constant.
symptoms are entered,
This
under the restructured
is
went
down
the hierarchy
defect, and therefore,
because
the program only
the top candidate birth defects.
system
directly
compared
with
the
However, the following observations can be made:
formulated estimates.
1)
cannot be
On
for
no matter
system is
how
many
goes down the tree for
the other hand,
every
the old
candidate
birth
as more symptoms imply more candidates,
its resource usage varies closely with the number of symptoms.
2)
The number of
I/O's under the restructured
system compared to
the original systen, ranges from one-sixth (with four symptoms)
to
one
twenty-eighth
(The
one
candidate birth defects,
and
(with seventeen
symptom case only retrieved five
symptoms)
.
PAGE 31
therefore, both systems had similar usage.)
Using the regression line statistics and the case of eleven symptoms,
the new system saves sixteen CPU seconds and 1500 I/O operations.
a
CPU rate of
$30 per minute and an I/O rate of
amounts to a savings of ?11 per
diagnosis.
$2 per thousand,
With 100 users,
five cases per month, the savings could reach $5500 per month.
Using
this
each doing
PAGE 32
/
30
I/O's
(hundreds)
/
25
/
/
/
/
20
/
o
/
/
15
/
10
yO
/
r
r
c
r
r
10
5
r
r
15
20
Number of symptoms
(o =
old system, r = restructured system,
(
—
*
= tie)
= regression line)
Figure 8: Experimental Results (I/O Usage)
PAGE 33
CPU
seconds
35
o
/
/'
I
30
25
20
15
10
O
/
o
r
/
J^'
/
/
r
/
/
^^5
^^^
^r
r
10
15
20
Number of symptoms
Figure
9:
Experimental Results (CPU Usage)
PAGE 34
Other Issues to be Considered
some other points of
Following are
interest that have
been touched
upon in this paper, but which are beyond the scope of this analysis.
Managerial Implications
Often,
:
the costs of the query and the costs of maintenance are
borne by two different
organizations.
Therefore,
these costs
may be weighed differently by a syston implementor.
Technical Issues :
An interesting
question concerns the independence
restructurings of the same database,
obtained
by
collapsing
a
level
e.g.,
are the advantages
independent
improvements that are being made elsewhere.
of multiple
of
other
PACK 35
Conclusion
The two concepts introduced here, family redundancy and collapsing of
levels, can under certain circumstances,
lead to dramatic savings.
The
conditions for a successful database restructuring include:
1)
A high ratio of database accesses to updates.
2)
A well-defined set of database operations.
3)
A small
number of data fields
that must be made
redundant in
order to obtain a high reduction in accesses.
4)
A systen that can be restructured cheaply and efficiently.
PAGE 36
References
1.
A.F. Cardenas, DATABASE MANAGEMENT SYSTEMS, Allyn and Bacon,
Boston, Mass.
2.
(1979)
A.F. Cardenas, "Analysis and Performance of Inverted Database
Structures," COMMUNICATIONS OF THE ACM 18, no.
5,
253-263
(1975)
3.
C.J. Date, AN INTRODUCTION TO DATABASE SYSTEMS, 2nd ed.,
Addison-Wesley, Reading, Mass.
4.
(1977)
J.J. Donovan and S.E. Madnick, "Institutional and Ad-hoc
Decision Support Systems and their Effective Use," DATABASE
no.
5.
3,
8,
Winter (1977)
J.J. Folinus, S.E. Madnick and H.B. Schutzman, "Virtual
Information in Data-Base Systems," SLOAN lORKING PAPER 723-74,
Center for Information Systems Research, M.I.T., July (1974)
6.
IBI, FOCUS USER'S MANUAL, Information Builders Inc., New York,
NY (1978)
7.
R.G. Ross, "Data Base Systems: Design, Implementation and
Management, Part III," COMPUTERWORLD, June 5 (1978)
8.
M.E. Senko, E.B. Altman, M.M. Astrahan and P.L. Fehder, "Data
Structures and Accessing in Data Base Systems," IBM SYSTEMS
JOURNAL 12, no.
9.
1,
30-93 (1974)
D.G. Severance, "Some Generalized Modeling Structures for Use
in Design of File Organizations," Doctoral Thesis, University
of Michigan (1972)
10. G.H. Sockut and R.P. Goldberg,
"Database Reorganization -
Principles and Practice," ACM COMPUTING SURVEYS 11, no.
4,
PAGE 37
371-395 (1979)
11. D.C. Tsichritzis and F.H. Lochovsky, DATABASE MANAGEMENT
SYSTEMS, Acadenic Press, New York, NY (1977)
12. D.C. Tsichritzis and F.H. Lochovsky, "Hierarchical Database
Managonent," ACM COMPUTING SURVEYS 8, no. 1, 105-124 (1976)
13. G. Wiederhold, DATABASE DESIQ^, McGraw Hill, New York NY
(1977)
Date Due
DE01S198f
(-.,.
"61
r^UL
APR.
1 1992
1
3
:^-
Lib-2(5-67
ACME
BOOKBINDING
NOV
'
1
CO., INC.
1983
100 CAMBRIDGE STREET
CHARLESTOWN, MASS.
I
HD28.IV1414
no.ll33- 80
Schmalensee, R/The new
B8
DxBKS
OxBKS
73956J._.
orga
industrial
OOr
iiiiiiiilii
TDSD DOl
3
HD28IV1414
no,
1134-
Banks, Louis.
/Living
74032g..
DxBK<
.
80
a
in
newsocracy
00137171
002 050 QM2
TDfiO
3
bMfl
Tflfl
H028.IVI414 no.ll35- 80
Wong, M. Antho/A Monte Carlo study of
740340
D.^fiKS
_00l3.34r
.
002 005
TOflO
3
.
flti3
HD28.IVI414 no.ll36- 80
Henriksson, Ro/Tests of market timing
001323"
D»BKS
740309
TOaO 001 TT2 T5M
3
«
( \\hn,)]^i ,i\^^
HD28.M414 no.ll40~ 80
Schmalensee, R/Product differentiation
D*fE
D*fBKS
7403333
0013264^
TOflD
3
001 TTb
Ifl?
HD28.IVI414 no.1141- 80
Buer, Terie K. /Investigation of make o
D.it.BKS
001323"
740829__.
.
I
I
3
TOflO
.
.
001 TT2 T13
HD28.IV1414 no.ll42- 80
Nevis, Edwin C/Gestalt awareness proce
iliilii
"°"iiiiiiiiiiiiilili
3
TOaO 001 TT? 55a
HD28.M414 no.ll43- 80
Deborah /Onstage with labor medi
Kolb,
0*BKS
740812
Q013235J..,
TOaO 001 iT2 525
3
HD28.I\/1414 no.ll44- 80
Berndt, Ernst /Short run labor
74081G
D.xBKS
product
00.13.22.8.4..
TOaO 001 TTE T3i
3
,iM6-?ro
3
TOaO ODM l?a T2W
HD28.M414 no.ll46- 80
^""^ /Getting medical attenti
740835
.D»BKS ..
00132387
y^.^il^^S-
_
3
TOflO
001 TT2 TTb
rvo4- p\-U^I
")
Download