Document 11072036

advertisement
DEW-*
HD28
.M414
no.
<?3
ALFRED
P.
WORKING PAPER
SLOAN SCHOOL OF MANAGEMENT
Quality Data Objects
WP #3517-93
December 1992
CISL
Richard Y.
M.
P.
WP# 92-06
Wang
Reddy
Sloan School of Management,
MIT
MASSACHUSETTS
INSTITUTE OF TECHNOLOGY
50 MEMORIAL DRIVE
CAMBRIDGE, MASSACHUSETTS 02139
Quality Data Objects
WP #3517-93
December 1992
CISL
Richard Y.
M.
P.
WP# 92-06
Wang
Reddy
Sloan School of Management,
*
MIT
see page bottom for complete address
Richard Y. Wang E53-322
M. Prabhakara Reddy
Sloan School of Management
Massachusetts Institute of Technology
Cambridge,
MA 01239
3 1993
1
Quality
Data Objects
Richard Y. Wang
M. Prabhakara Reddy
December
1992
CISL-92-06
Composite Information Systems Laboratory
E53-320. Sloan School of
Management
Technology
Cambridge. Mass. 02139
Massachusetts
Institute of
Prof. Richard Wang
(617)2530442
Bitnet Address: rwang@eagle.mit.edu
ATTN:
1992 Richard
Y.
Wang and M.
Prabhakara Reddy
Work reported herein has been supported, in part, by MIT's Total Data
Quality Management (TDQM) project. MIT's Productivity From Information Technology
(PROFIT) Consortium. MIT's International Financial Service Research Center (IFSRC) and MIT's
Center for Information Systems Research (CISR). In particular, the authors wish to thank Prof.
Stuart Madnick and Dr. Amar Gupta for their support to this research.
Acknowledgments
Quality Data Objects
ABSTRACT
Needs
resources are becoming increasingly
This paper investigates
critical.
associate data with quality information that can help users
quality of data.
investigate
its
Specifically,
includes a description of the
and
object,
a
mechanism
The behavior
object.
we propose
and behavior.
structure
management
for a quality perspective in the
datum
how
make judgments
The structure of the quality data
datum
to
of the
the concept of quality data object
object, its
to associate the
of data
and
object
corresponding quality description
object with
its
of the quality object includes a set of
quality description
methods
quality dimensions (such as timeliness, completeness, credibility).
to
measure
In addition,
we
have developed a quality data object algebra that includes quality comparison
methods and an algebra
domain.
It
that extends the relational algebra to the quality data object
allows for a systematic construction of retrieval methods for quality data
objects.
The concept of quality data
the design
object
object presented in the paper
and manufacture of data products.
proposed
in this
We
is
step toward
envision that the quality data
It
will enable users to
the quality of data products according to their chosen criteria;
hope
first
paper can be used as basic building blocks for the design,
manufacture, and delivery of quality data products.
users to
a
buy data products based on
their quality requirements.
that the concepts of quality data objects
improve data quality and data
reusability.
it
measure
will also enable
In this
manner,
and quality data products
we
will help
1.
2.
Introduction
1
work
1.1.
Related
1.2.
Research Focus
2
3
The quality data object
2.1.
5
Structure of the quality data object
2.1.1.
Semantics of the is-a-quality-of link
2.1.1.1.
2.2.
6
Difference between is-a-part-of and is-a-quality-of
6
The quality data
object
schema
Behavior of the quality data object
2.2.1. Currency
3.2.
8
9
Volatility
10
2.2.3.
Timeliness
11
11
2.2.5.
Accuracy
Completeness
2.2.6.
Credibility
12
A quality data object algebra
3.1.
7
2.2.2.
2.2.4.
3.
5
Difference between is-a and is-a-quality-of
2.1.1.2.
2.1.2.
5
Quality comparison methods in a quality-description object
i_deep_equal _method
q_equal _method
Quality algebraic methods
12
13
14
3.1.1.
15
3.1.2.
15
3.2.1.
3.2.2.
3.2.3.
3.2.4.
3.2.5.
Method
Union Method
Difference Method
Projection Method
Cartesian Product Method
Selection
4.
Concluding remarks
5.
Appendix
6.
References
A
16
16
16
17
17
18
19
20
22
Quality Data Objects
Introduction
1.
The quality
implemented by a conventional data base management
of data in a database
systems (DBMS) has been treated, primarily, through functionalities such as recovery, concurrency,
and security control
integrity,
(e.g.,
Bernstein
Codd,
1982;
Codd,
Wood,
1981;
Hoffman, 1977; Hsiao, Kerr,
Qian
&
1986; Date, 1981; Date, 1990;
&
Denning
1981; Bernstein, et
&
some
failure has
consistency of data
is
Madnick, 1978; Korth
&
failures,
that are
it
is
known
to a state that is
to
be
when
multiple users update the database concurrently. Integrity aims
may
be caused by
by mistakes on the part of the operator or the application programmer, by system
the last of these, however,
falsification;
is
much
not so
a matter of
of security; protecting the database against illegal operations, as opposed to those
merely invalid,
is
the responsibility of the security subsystem (Date, 1985).
These functionalities are necessary but not
the end-user's perspective (Johnson, Leitch,
Liepins, 1989;
&
rendered the current state incorrect. Concurrency control ensures that the
preserved
even by deliberate
integrity as
1970;
Silberschatz, 1986; Martin, 1973;
preventing invalid updates against the database from happening. Invalid updates
errors in data entry,
Codd,
1981;
al.,
Denning, 1979; Fernandez, Summers,
Wiederhold, 1986; Ullman, 1982). Recovery restores the database
correct after
at
& Goodman,
Wang &
&
sufficient to
ensure data quality in the
Neter, 1981; Laudon, 1986; Liepins
Kon, 1992; Zarkovich, 1966).
Integrity constraints
and
example, are essential to ensuring data quality in a database, but they are often
&
DBMS
from
Uppuluri, 1990;
validity checks, for
just the
beginning of a
continuing data-integrity program that will ultimately address the real needs of users for data that
can be used as an input to the user's decision making process (Maxwell, 1989). In general, data in the
DBMS may be used
by a range
of different organizational functions with different perceptions of
what
constitutes quality data in terms of dimensions such as accuracy, completeness, consistency, and
timeliness (Ballou
&
Pazer, 1987;
Huh,
et
al.,
1990;
Redman,
1992).
Consider the following example scenarios:
•
A
person's
name
is
carried as
Rockart in yet another.
J.
F.
Rockart in once place, John
All are technically "true"
provided by the conventional
in the database consistently ?
F.
Rockart in another, and Jack
and would pass the
DBMS, but which one should be considered
integrity constraints
as accurate and stored
A
•
client workstation
at the
end
runs business applications using data downloaded from a database server
Whereas, data
of each day.
in the server is
updated instantly with changes and new
information through on-line transaction processing.
Thus, data
in the client
workstation
is
never current from the server and some user's viewpoint.
Earning estimates for companies are stored
•
and how are
not,
making
it
difficult to
in a
database but
who made
these estimates, when,
judge the credibility of the data by those
who
are not
familiar with the context.
In these
and other similar
situations, the quality of data
a matter of data validity but rather of its
information that can help users
usag e.
how
would be useful
It
make judgments
managed by
the
to associate
and manage data
is
equipped with the capabilities
measure the quality of data they need and
conforms with
LL
is
not so
data with quality
to structure
in
such a
way
that users can be
to retrieve the data that
their quality requirements.
Related work
An
attribute-based research that facilitates cell-level tagging of data has been proposed to
enable users to retrieve data that conforms with their quality requirements (Wang, Kon,
1993;
Wang, Reddy, & Kon,
effort are a
methodology
1992;
Wang &
Madnick,
for analyzing data quality
model description,
relational
indicator algebra can be used to process
From
The problem with
Li, 1987),
and
Codd,
1970;
SQL
(1) In
corresponding quality description through the
created through the concept of quality key.
Madnick,
ER model proposed
an attribute-based model encompassing
a quality indicator algebra that extends the
1979;
Codd,
1982;
Codd,
1986).
The
quality
queries that are augmented with quality indicator
these quality indicators, the user can
this research is twofold:
&
Included in this attribute-based research
requirements that extends the
a set of quality integrity rules,
model proposed by Codd (Codd,
requirements.
1990).
&
by Chen (Chen, 1976; Chen, 1984; Chen, 1991; Chen
a
much
of the quality of data for the specific application at
hand. The research question here
to
DBMS
join
(2) In
make
order
a better
judgment of the quality of
to associate the
data.
application data with
operation in the model, an
artificial link
needs
order to be able to judge the quality of data,
to
it
its
be
is
necessary to compute data quality dimension values and other procedure-oriented quality measures.
Although these could be accomplished using the
that of the object-oriented approach.
relational approach,
Moreover,
this research
it
is
not as natural compared to
did not address issues involved in
measuring data quality dimension values.
In other related research efforts that
aim
at
annotating data, self-describing data
files
and
meta-data management have been proposed at the schema level (McCarthy, 1982; McCarthy, 1984;
however,
McCarthy, 1988);
information at the instance
dimension of data
concept.
in
no
level.
been offered
specific solution has
In (Sciore, 1991), annotations are
manipulate such quality
to
used
an object-oriented environment. However, data quality
Therefore, a
more general treatment
is
Still
is
a multi-dimensional
necessary to address the data quality issue.
importantly, no algebra or calculus-based language
annotations associated with the data.
support the temporal
to
provided
is
More
support the manipulation of
to
other research efforts (Codd, 1979; Siegel
&
Madnick,
1991) have dealt with data tagging without either an algebra or a set of quality measures for data
quality dimensions.
Li
Research Focus
In this paper,
we
advocate that data quality must be modeled as an integral part of a data
object rather than simply as a set of functionalities of the
modeling construct of quality data object
in
which each datum
procedures used to indicate the quality of the data object.
that
set of quality algebraic
Many concepts in
methods
the
associated with appropriate data and
We present a
paradigm can be applied
They are fundamental
The reader
via the object-oriented approach.
set of quality
measure methods
timeliness),
is
support the quality data object
to
in our decision to
referred to the
model the quality data
Appendix
object
for a detailed discussion of
constructs in the object-oriented paradigm such as inheritance, method, polymorphism, active
and message can be applied
value,
In this research, each
to
support the quality data object.
datum is modeled
the quality information corresponding to the
as an object called a
datum
is
constructed from a
object
.
n)
datum
object
and
its
datum object As shown in Figure
.
called a quality description object
quality-of link associates a quality description object with
...,
we propose
that supports the manipulation of quality data objects.
the object-oriented
(Banerjee, 1987; Snyder, 1986).
1,
is
specifically,
compute quality dimension values (such as accuracy, consistency, completeness, and
and a
how
DBMS. More
its
datum
object.
associated quality description object
.
The
The composite
is
1,
is-a-
object
called a quality data
Instance variables of a quality description object include descriptive data (qualityjndicatorj, i=
and procedural data (quality_procedurej,
j= 1,
...,
m).
Figure
A
1:
A Quality Data
quality data object called Earnings-Estimate
is
Object
exemplified in Figure
2.
Note
that
Eamings-
Estimate-Qual and Source-1-Qual are quality description objects for Earnings-Estimate and Source-1
respectively.
Note also
Qual,
a quality data object.
is itself
that Source-1,
an attribute of the quality description object Eamings-Estimate-
Earnings- Eatlmats
is-a-quality-of
Figure 2
The Quality Data Object Earnings-Estimate
Section 2 presents the quality data object. Section 3 presents a quality data object algebra that
allows for the construction of methods which conform with the user's quality requirements. Conclusions
and future directions are presented
in Section 4.
The
2.
quality data object
In this section, the quality data object
presented in terms of
is
structure and behavior.
its
Included in the structure of the quality data object are a definition of the components of a quality data
object, the
semantics of
is-a-quality-of,
and the quality data
object schema.
the quality data object are a discussion of dimensions of data quality
Included in the behavior of
and quality measure methods and
messages.
2.1.
Structure of the quality data object
Following the object structure defined
Khoshafian
&
Copeland, 1990; Zdonik
&
in the object-oriented
Maier, 1990),
we
paradigm (Banerjee,
1987;
define two object types for the quality data
object.
Let
I
denote the
such as integer,
An
•
set of
real, string.
system generated
Let
identifiers.
B denote the
set of
base atomic types
Then
object is defined as a primitive object provided that
its
value belongs to
B.
The value of a
primitive object can not be further subdivided. In the context of the quality data object, every
datum
An
•
object
is
a primitive object.
object is defined as a tuple object
are distinct attribute
names and
i
4
's
if its
in Figure
a is-a-quality-of link.
object
which
is
1,
is
is
the quality description object
The composite
object resulting
a n :i n > where
ai's
In the context of the quality
I.
from
is
associated with
this association is
a unit of manipulation. Thus every quality data object
is
its
datum object through
defined as a quality data
a composite object. This
levels.
Semantics of the is-a-quality-of link
Note
that there is
no
specific
mechanism
quality description object with the primitive
generalization
(is-a)
nor the aggregation
the is-a-quality-of link.
and the
...,
a tuple object.
composite property can be nested in an arbitrary number of
2.1.1.
of the form <ai:ii, a2:i2,
are distinct identifiers from
data object, every quality description object
As shown
value
The
is-a-part-of link is
is-a link is
used
in the object-oriented
datum
(is-a-part-oft
used
to associate
object.
More
paradigm
to associate the
specifically,
neither the
construct can be used to capture the semantics of
to associate a subclass object
an object with
its
with
its
super class
assembly object (Banerjee, 1987).
object;
Difference between
2.1.1.1.
The
datum
link
is-a-quality-of
object
and
its
and
is-a
is
is-a-quality-of
conceptually different from
quality description object
semantically different from
is-a
is
because the relation between a
is-a
not a super-class vs. subclass relation.
because the construct inheritance that
associated with is-a
is
It
is
is
not
applicable to the is-a-quality-of link.
Difference between is-a-part-of and i$-a-quality-of
2.1.1.2.
The conceptual difference between
is-a-quality-of
and is-a-part-of
represents the relation between the objects having part and assembly relation;
datum
represents the association between a
To present
object
the semantic difference
and
between
that is-a-part-of
is
whereas
is-a-quality-of
quality description object.
its
is-a-quality-of
and
to object Oj,
then O,
we
first
discuss the
said to have
composite
is-a-part-of,
semantics of is-a-part-of.
If
there
is
reference from Oj.
object of O,.
object,
a is-a-part-of link
The
object Oj
is
from object O,
called the parent object of O,
Based on whether an object has a
and whether
the existence of an object
composite references have been formalized
composite reference,
reference,
and
(4)
(2)
is-a-part-of link
object Oj
depends on the existence of
&
(Kim, Bertino,
parent object, four types of
its
Garza, 1989):
(1)
dependent exclusive
dependent shared composite
(3)
independent shared composite reference.
is-a-part-of
and
is-a-quality-of
comes from
has only two composite references instead of four in the case of
the fact that is-a-
Specifically, let
of
•
Oq
and
A
Od
set of objects to
Od
denote a datum object and
whom Oq
Oq
has a is-a-quality-of
Let del(O
q
)
refer to
.
a quality description object of
link.
We
is-a-part-of.
them as dependent exclusive quality reference and dependent shared quality reference
denote the
component
called the
is
with only one object or more than one
independent exclusive composite reference,
The semantic difference between
quality-of
and the
is
Od
.
Let
Q<O q
)
and del(Od ) denote deletion
respectively. Then,
dependent exclusive quality reference from
Od
to
Oq
means
that
Q(O q
)
= {O d }, and del(O d
)
implies del(O q ).
•
A dependent shared quality reference from O d
then del(Od ) implies del(O q ).
In
existence
If
Q(Oq D (Od
)
)
to
Oq
means
that
its
)
then del(Od ) implies {O d }
dependent quality references the quality description object
depends on the existence of
Q(Oq 2 (O d
corresponding datum
is
is
If
Q(Oq
deleted from
treated as a
object.
).
weak
object
)
= (O d
)
Q(O q ).
and
its
Dependent exclusive quality
references increases storage overhead.
Whereas, dependent shared quality references are beneficial
from the storage view point but causes problems during deletion and update.
2.1.2.
The
quality data object schema
Quality data objects are used as building blocks to construct a quality data object schema. For
exposition purposes,
we
first illustrate, in
Figure
3,
a composite object
company
in the object-oriented
paradigm, which has instance variables Company-Name, CEO-Name, and Earnings-Estimate (each of
the instance variables
is
a primitive object, hence the composite object company).
Company
Company-Name^ (CEO-Nam*}
(^Earnings-Estimate
>
<
Figure
Let us
now suppose
3:
The Object Company
that out of these three primitive objects, the
Estimate are quality sensitive, and are converted into quality data objects as
CEO-Name and
shown
in Figure 4 below.
is-a-quality-of
L
D>
Collection-Procedure
Figure
4:
The Quality Data Object Q-Company
Earnings-
The quality data
objects
communicate with
in the object-oriented
object
it
encapsulated as a unit of manipulation.
is
through pre-defined methods only.
paradigm.
to retrieve the data that
Q-company
In addition,
conforms with
it
It
behaves
has the capabilities
to
in the
That
is,
other
same way as an
object
measure the quality
of data
and
users' quality requirements, as will be discussed in Section 2.2.
Using quality data objects as basic building blocks, more complex objects can be constructed
through other object-oriented constructs such as aggregation
Figure
5, for
(is-a-part-of)
and generalization
(is-a).
In
example, the Directed Acyclic Graph (DAG) constructed with the quality data objects Q-
Company, Q-IT-Department, Q-Finance-Department, and Q-High-Tech-Company forms
a quality
data object schema.
Q-Company
is-a-part-of
is-a-part-of
Q-fT-Department
5:
Quality of Object Schema
have presented the quality data object
unique
construct that
is
paradigm,
now
it
is
Q-Hgh-Tech -Company
Q-Finance-Department
Figure
We
is-a
to the quality
in
terms of
its
Through
structure.
the is-a-quality-of
data object and the other constructs in the object-oriented
possible to construct a quality data object schema.
behavior of the quality data object that will addresses the issues of
The next
how
to
section presents the
measure the quality of
data.
2,2.
Behavior of the quality data object
The multi-dimensional and
(Wang, Reddy,
considering
&
how
Kon, 1992;
a user
hierarchical characteristics of data quality
Wang &
may make
Strong, 1992).
illustrate these
two
characteristics here
decisions based on certain data retrieved from a database.
user must be able to get to the data, which
means and
We
were investigated
privilege to get the data).
means
that the data
must be accessible
by
First the
(the user has the
Second, the user must be able to interpret the data (the user
understands the syntax and semantics of the data). Third, the data must be useful (data can be used as
an input
to the user's decision
making
process).
Finally, the data
must be believable
extent that the user can use the data as a decision input). Resulting from this
dimensions:
accessibility, interpretability, usefulness,
the user, the data
must be available
(exists in
and
some form
believability.
that can
In
list
to the user (to the
are the following four
order to be accessible
be accessed);
to
to
be useful, the data
must be relevant
consider,
among
requirements for making the decision); and
(fits
to
be believable, the user
may
other factors, that the data be complete, timely, consistent, credible, and accurate
Timeliness, in turn, can be characterized by currency (when the data item
and volatility (how long the item remains
valid)
characteristics of data quality provide a conceptual
was
.
stored in the database)
These multi-dimensional and hierarchical
.
framework
for defining the
behavior of the
quality data object.
In general, the behavior of
object
datum
of the quality data object, both
messages meant for
an
is
encapsulated in
objects
its
methods and messages.
and quality description
and update,
their creation, deletion,
objects will
In the context
have methods and
just like objects in the object-oriented
paradigm.
Each message
Message
The
described using the following syntax.
is
:= (receiver)
(receiver) part
is
(message_nameX[(argument)
an
identifier
])
denoting an object that receives and interpret the message.
The (message_name) gives the name of the message which helps the receiving
The (argument) part
message with a particular method.
by the method
in the receiving object.
of the
message
object to associate the
carries data
which
is
required
A message can have zero, one, or more than one arguments, as the
brackets " indicate.
Each method
is
described using the following syntax.
Method_name:
(name of
Invoked_by:
(messaget, message])
Method_action:
(procedural description of the method)
the method)
Only those methods and messages
paper. Below
2-2.1.
we
define key methods that measure data quality.
Currency
The currency dimension
is
solely a characteristic of storage of the data.
currency on a continuous scale from
possible, state
value of
C
is
1
to
1.
to the oldest stored data.
computed dynamically using
indicator value tagged to every instance.
•
related to the data quality aspect are presented in this
The
Let
state
C
would be assigned
We propose to
measure
to data that are as current as
represent the measure for currency (0 £
C
£
1).
The
the creation time of the instance. Creation time is a quality
Depending on the message, the currency method
determine the currency of an individual instance,
can:
determine the average currency of the instances of the
class,
determine the percentage of instances whose currency meets one of the following conditions
(referred to as 6)
(
1 )
A
when compared
to the total instances of the class available in the database:
below or above a user -defined currency
description of the
method
is
level, (2) in
between
a user-defined currency interval.
given below.
Method_name:
Currency _method
Invoked_by:
(receiver) currency(instance_variable)
(receiver) average_currency(instance_variable)
(receiver) 8-currency(instance_variable, 9)
For the message currency, the method returns a pairwise value, (instance value,
Method_action:
currency), for
all
For the message
the instances satisfying the qualification.
average _currency, the currency method returns the average currency of
instances of the instance variable.
method returns
For the message 9-currency,
all
the currency
the percentage of the currency values of the instances that
satisfy the condition 8.
2,2.2.
Volatility
The
time.
state
intrinsic property of the data
was
For example, the fact that George Washington
remains true no matter
may
an
volatility of data is
how
long ago that
be woefully out of date.
We
fact
propose
to
was recorded. On
measure
volatility
Let Xj denote a
random
The
variable,
volatility is
i
=
1, 2,
...,
measured
N, then
V
is
on
computed
-
i=l
N-l
N
J^Xi
10
its
storage
president of the United States
a continuous scale
from
1
via the coefficient of variation,
£ X?
andX=
unrelated to
the other hand, yesterday's stock quote
V=4
X
where S=
first
is
do not change over time) and
refers to data that are not volatile at all (they
that are constantly in flux.
the
which
NX
3
to
1
where
refers to data
denoted by V.
as follows (Kazmier, 1976):
A description of
the
method
is
given below.
Method_name:
Volatility_method
Invoked_by:
(receiver)
Method_action:
The system monitors updates
volatility(instance_variable, qualification_for_instances)
to the value of
an instance variable
that
is
being
modified and simultaneously computes the following three required parameters
order to compute the coefficient of variation:
in
N, the
(a)
total
number
of
sum
of
N
updates, (b) X, the average of
all
updated values, and
(c)
2^ X;
the
,
i=l
squares of updated values.
These three parameters are stored as quality
indicators in the quality description object corresponding to each instance
variable.
all
The method returns
a pairwise value, (instance value, volatility), for
qualified instances.
2.23. Timeliness
Timeliness
situation
is to
is
defined as a function of currency and volatility of a data value. The most stable
have data
for
which the currency
(unchanging) or both. For such data there
data are old (currency =
1)
combining currency and
representing the best and
and highly
is
(entered very recently) or the volatility of
no timeliness problem. The worst
volatile (volatility
volatility via their
1
is
=
1).
We
propose
T= VCV
root-mean square:
to
situation arises
when
measure timeliness by
where
5 T <
1
with
the worst case.
A description of the method is given below.
Method_name:
Timeliness_method
Invoked_by:
(receiver) timeliness(instance_variable, qualification_for_instances)
Method_action:
The method returns a pairwise
qualified instances to the
2.2.4.
value, (instance value, Timeliness), for for
all
message sender.
Accuracy
In general, a user can test the accuracy of the data present in a database with a set of sample
data considered to be accurate by the user. For example, a user
payroll database can
basis of this
test,
first
the user
who wants
check whether his salary (known data)
makes judgment whether
accuracy on a continuous scale from
to 1
where
to
state
accurate) case.
11
is
to
check the accuracy of a
recorded correctly or not.
query the database or
not.
On
the
We propose to measure
refers to best (all accurate)
and
1
the worst (none
A
method
description of the
is
given below.
Mithod_name:
Accuracy_method
Invoked_by:
(receiver)
Method
The
action:
first
accuracy( instance_variable, list_of_known_instances)
argument
in the parenthesis gives the
whose accuracy was
that
were known
to the
of an instance variable
The second argument gives
message sender.
This
known
a
list
of instances
of instances
is
considered as true values. The method computes the percentage, denoted as
p,
of
list
match between the true values and the recorded values and returns
the
(1-p) to
message sender.
Completeness
2.2.5.
Following Ballou and Pazer,
recorded (Ballou
where
required.
name
Pazer, 1987).
refers to the best
state
measure
&
implies that
it
we
define completeness as
We propose
and
1
to
values for a certain variables are
measure completeness on a continuous
the worst case.
has no null instances
all
scale
from
to
1
For a given instance variable, the completeness
in the database,
Using
values recorded for the instance variable are null.
whereas the measure
this, a
1
implies
all
the
user can measure the degree of
completeness of the database regarding an instance variable.
A description of the method
is
given below.
Method_name:
Completeness_method
Invoked_by:
(receiver) completeness(instance_variable)
Method
The method measures
action:
instance variable
the percentage, denoted as p, of
when compared
available in the database
2.2.6.
The
empty
instances of the
to the total instances of the instance variable
and returns p
to the
message sender.
Credibility
credibility of a
datum
in a database is
present in the quality description object of the
computed based on
datum and
(2) the set
Let x be an instance. Let q be the quality indicator of x
(
the user wants to use to
compute the
credibility of
x.
and
(1)
of specifications given
let "J'
assigned to the quality indicator q by the user. Let
4
=0.
The
credibility of x is
the user.
Let uv, be the user's specified value for qt and
4
5j
by
be the number of quality indictors
be the recorded value of the quality indicator q for x in the database. Let
uv, =rv, else
the quality indicator values
computed by
12
8(
w
(
rv
(
be the credibility weight
be a binary variable defined as follows:
the following expression:
let
8i
=1
if
I wi*5j.
A description of the method
is
given below.
Method_name:
Credibility_method
Invoked_by:
(i)
(receiver) credibility-a(instance_variable
[,
(ii)
(quality_indicator, quality _indicator_value, credibility_weight)])
(receiver) credibility-b(instance_variable
[,
(iii)
(quality _indicator, quality_indicator_value
)])
(receiver) credibility-c(instance_variable
[,
(quality_indicator, quality_indicator_value, credibility_weight)],
desired_credibility)
Method_action:
For
the
message
credibility-a,
the
method returns values
instance_variable and their associated credibilities.
quality_indicator
is
If
of
weight for each
not specified (message credibility-b) then the method
assumes equal weight
for each quality indicator specified
by the user and
returns values of the instance variable and their associated credibilities.
credibility-c, the
whose
We
quality.
method returns only those values
credibility is
the
more than or equal
For
of the instance variable,
to the desired_credibility.
have presented the methods and messages that measure the key dimensions of data
They define an important part of the behavior
behavioral component of the quality data object
the user's quality requirements.
is
In the next section,
we
critical
present an algebra for the quality data object
methods
for the quality data object.
A qualify data object algebra
In order to retrieve quality data object instances from a database,
it is
those quality data object instances that conform with requirements for both the
quality description portion. This requires a set of
perform the operations such as
The other
the capability to retrieve data that conforms with
that allows for a systematically construction of retrieval
3.
of the quality data object.
methods
selection, projection,
and
to
necessary to identify
datum
portion and the
perform the comparisons and an algebra
to
join of quality data objects. Section 3.1 presents
quality comparison methods. Section 3.2 presents an algebra that extends the relational algebra to the
quality data object domain.
13
3.1.
Quality comparison methods in a quality-description obj ect
In this subsection
the
methods which are
two
different quality
&
We
Copeland, 1990).
and 0_equal
i_deq?_equal, M_deep_equal,
that underlie these
Two
primitive objects are defined to be QjLeep_equa\
Two
tuple objects are defined to be l_deq>_equal
on the same
the values they take
2_deep_equal
if
l)_deep_equal.
Let O]
0_deep_equal
if
their
datum values and
datum
their values matches.
if
they have the
Two
and
Two
portions are identical.
two quality data objects are i_deep_equal
and Ojare
We
up
to the
"
1
j
02 that
and
set of attributes
if
tuple objects are defined to be
two tuple
Similarly,
on the same attribute are
(i-
i_deep_equal.
is
quality data objects are l_deep_equal
illustrate the
is
Let
o-[
maximum
the
7.
In
order to do
to qi]
,
and the value
of souree-1
and the value of source-2 would correspond
In Figure
to the object 02
7, let o-i
and
02 be
would correspond
to
the same.
is
.
7.
first
o-[
and
02,
.
exemplify the
In Figure
would correspond
to vj
Object c^
However,
values they take on qi 31 are different (v 31 vs. x 31 >.
follows that 0]
we
2,
to vo-
earnings
Source-1
Source-2 would correspond
The quality data object
objects.
because both have the same datum value, v
all
the corresponding
to
v\-[.
two quality data
values they take on qi^qi^qiaare
their
level are identical.
depth of both
it,
be a quality data object in Figure
estimate would correspond to 01, and the value of earnings estimate
would correspond
if
defined as M_deep_equal, denoted by Oi = M 02
above concepts through Figure
2.
'V is
If
first
datum values and
their
if
level are identical.
\_deep_equal, then this relation
notation used in Figure 7 via Figure
qill,
same
the corresponding quality indicator values at the
quality indicator values
O]
Ot
on equality
two comparison methods.
the values they take
denote two tuple objects
of
the context of the quality data object, two quality data objects are defined to be
In
Similarly,
=* 02
if
and some
define the concepts of 0_deep_equal,
on same attributes are 1_deep_equal.
be i_deep_equal
to
if
in detail
also discussed, based
first
attribute are 0_deep_equal.
the values they take
objects are defined
if
two methods are
special cases of these
definitions provided in (Khoshafian
and
comparison methods are discussed
not M_deep_equal to 02.
14
Ot is
Since the
is 1
Ot is
0_deep_equal
_deep _equal to 02 because the
not 2_deep_equal to 02 because the
maximum number
of level of Oj
is 2, it
^
OfcCo)
(qi,.v,)
|,v,)
/\
(4iV v 1l)
(«" 2'
1
(qi
2
.v
-
2)
(qi
3
3.1.1.
v
("VV Wj'ty
)
3
\
v
\
V
12 )(q, 2r 21
(q'
)
Figure
We now
,
7:
3
v
r
3i>
(q'
(^11^11)
Quality Description objects
:
o,
12
.
W3.V3)
\
\
v
,
2
)
(q 2
j
r
v
(qi
3r
2i>
x
3i>
and 02
present the two comparison methods: i_deep_equal and 9_equal.
i_deep_equal .method
A description of the method
is
given below.
Method_name:
i_deep_equal _method
Invoked_by:
(receiver) i_deep_equal(object 1 ,object2, no_of_levels)
Method_action:
This method compares object and object2 and then returns True
object2are i_deep_equal, where
'V is
object!
if
and
the no_of_levels specified by the user,
else returns False.
3.1.2.
6_equal .method
Two
0_deep_equal.
quality data objects are 0_equal,
If
two
objects o,
and 02 have 9_equal then relation
For example, consider 9
9_equal.
objects Oi
the values they take
if
=
{qi,, qi 2/
qin,
qi2il-
is
denoted by
9_M_deq>_equal, then relation
is
is
Oi
The quality description
Similarly one can define 9_i_deep_equal_method and
and 02 are 9_i_deep_equal, then relation
on the
attributes in 9 are
=e 02.
objects o,
and
9_M_deep _equal_method.
denoted by o, =6
(>)
oj
denoted by 01 = e<M) 02. For example, 9 =
If
{
03 are
If
two
two objects o, and 02 ha ve
qii
}
then o, and 02
have
9_M_deep_equal.
A description of the method is given below.
Method_name:
9_equal .method
Invoked_by:
(receiver) 9_equal(objecti,object2 , 9)
Method
This method compares object! and object2
action:
object2
where 9
is
and then returns True
the set of quality indicators specified
False.
15
by the
if
object,
=9
user, else returns
These two quality comparison methods are used
to define the
following quality algebraic
methods.
3.2.
Qua lity
algebraic
In this section,
3.2.1
Selection
methods
we
Method
Selecnon_method
must
selected
first
introduce quality algebraic methods to operate on quality data objects.
selects only a subset of objects
satisfy the selection criterion.
Let
order predicates. This operation creates
collection O,
which
satisfy the predicates
and the predicate q
symbolically denoted as a^(0,p,q),
aq
A
p and
where
(
m
<,
n) objects of type
The predicate p
q.
n objects of type
is
I
method
is
T.
T from
that
each object
Let
p and q be
members
the
on the datum
a constraint
of
object
The selection_method
defined as follows:
is
O) a p(o) a
(O, p,q) = (o (o e
description of the
m
a collection of
constraint on the quality description object.
a
is
O be
from an object collection such
q(o))
given below.
Method_name:
Selection_Method
Invoked_by:
(receiver) selection(object_class, data_constiaint, quality _constraint)
Method_action:
This method checks each instance of the object_class to see whether they
and the quality _constraint, and returns
satisfy data_constraint
object
all
instances of the object_class which satisfy both of these constraints.
\2-2
Union Method
In
O] be
this
union_method the two operand quality data
/
the collection of n objects of type
method
is
a collection of
from the collection 0\ and
when compared
as
u^
(Oi, O2, 9)
In the
objects (where n
is
<,
p
the collection of
<,
n+m)
selects only those instances
to the instances of
^q (Oi O2
,
p
T and O2 be
object collections
must be
of the
same
m objects of type T.
of type T. This
method
type. Let
The
result of
selects all instances
from the collection O2 which are not duplicates
Oi. The logic of the union_method, which
is
symbolically denoted
defined as follows
,
6)
»
{o
I
Voe Oi
above expression,
" -.
)
u
{
o V02 €
I
(ot= 02) a (01=
and 02 are considered duplicates provided
0_deep_equal with respect to
all
O2 3d
Oi a (o =
02)} " is
that their
quality indicators in
16
e
9.
meant
M 02) a
{(o 1
=°o2 ) a
(o,=
to eliminate duplicates.
datum portions
Note
-,
that the
are the
6
02))
Objects 01
same and they
above definition
}
for the
are
union
is
commutative from the view point of the user who defined
9
c»i=
02
9.
In general
it
is
not commutative because,
u
does not mean (oi^cc).
A description of
the
method
is
given below.
Method_name:
Union_method
Invoked_by:
(receiver) union(Oi,C>2,9)
Method_action:
Let resultl be the set of
instances in the object collection 0\. Let result! be
all
O2 such
the subset (need not be strict subset) of instances of
from
result =resultl
In difference_method, the
method
is
u result!.
to
any instance
This method returns the set
any instance
in resultl. Let
result.
Method
3.23 Difference
collection of
0_deep_equal and not 9_equl
result! is not
that
two operand
n objects of type T and O2 be a
a collection of
p
which are not 0_deep_equal
to objects in
logic of the difference_method,
—q (Oi,02,9) =
collection of
p 5
objects (where
object collections
O2
m objects of type T.
n) of type T.
The
with respect to
" (0\
denoted as
O2
,
{olVo,60i 3o2e02,(o=
M
,
must be of the same
all
The
O] be
type. Let
a
result of the difference
result consists of objects only
from Oi
quality indicator specified in 9.
The
9) is defined as follows
o,)
a^ {(o
1
-°o a ) a(o,-
8
02)}}
A description of the method is given below.
Method_name:
Difference_method
Invoked_by:
(receiver) difference(Oi, 02/9)
Method_action:
Let result be the set of
all
0_deep_equal and 9_equal
to
the instances of
Oi except those
that
any instance of O2. This method returns the
are
set
result.
3.2.4 Projection
Let
Method
O be an object collection of m objects of type T. Projection_method generates p (where p < m)
T from the object collection O. Let o be an object in the collection O. The function returns
an object o' of type T from the object o. The projection method also eliminates duplicate objects from the
objects of type
f
q
result.
The
logic of the projection_method,
which
is
symbolically denoted as
fl
(O,
f:T', 9) is
as follows
n
q
(O,
f:T',
9)=((
n
(O,
f:T)}— [oj
I
o o €
v
2
n (O,
17
f:T'),
{(o^Oj) a (o^Oz)}
1
defined
where n
A
(O,
f:D =
(f(o)l
description of the
0€ O)
method
is
given below.
Method_name:
Projection_method
Invoked_by:
(receiver) projecrion(0, 8)
Method_action:
Let
O
7
be the object type whose instances variables are
variables of O.
0\
Let
f
a subset of the instance
be a function which takes an instance of
Let resultl be the set of instances of (J.
Let result
C
O and
resultl
instances generated by eliminating duplicates from the set resultl.
instantiates
be the
set of
The method
returns the set result.
3.2.5
Cartesian Product
Let
Oi be an
Method
object collection of n objects of type Ti,
of type T2. Let o, be an object in the collection
constructs a
new object 01
© 02 of type T3
from
0\ and
01
and
let
02.
/
and
02 be
let
O2
be an object collection of
an object
in the collection O2.
m objects
The method
Objects of type T3 consists of instance variables
from both Ti and T2. The logic of the cartesian_product_method, denoted as
n
q
(O,
f:T', 0) is
defined as
follows
X
q
A
(Oi,02) = (o
I
Vo, e
description of the
Oi V02
method
is
e
02,0 =
0,
©02)
given below.
Method_name:
Cartesian_product_method
Invoked_by:
(receiver) cartesian_product(Oi
Method_action:
Let
O3 be
a
new
and of 02- Let
f
object type
,
O2)
which
will
have
all
the instance variables of
0\
be the function which take instances of Oi and instances of O2
and with these
instances, the function
method returns
the set of instances of O3.
f
instantiates the object type O3.
This
Other algebraic methods such as Intersection_method and Join_method can be defined using the
above defined
five algebraic
methods.
18
Concluding remarks
4.
In this paper,
help users
we have
make judgments
question was
capabilities to
how
investigated
how
with quality information that can
to associate data
of the quality of data for the specific application at hand.
and manage data
to structure
in
such a
measure the quality of data they need and
way
that users could
to retrieve the
Our
research
be equipped with the
data that conforms with their
quality requirements.
Toward
object
object.
is
this goal,
we have proposed
the concept of quality data object in
which each datum
associated with appropriate data and procedures used to indicate the quality of the
Specifically, the is-a-quality-of link is
corresponding quality description
object.
associated quality description object
is
proposed
The composite
to associate a
object constructed
called a quality data object.
access object instances which matches users' quality requirements.
measure methods
that
It
object with
It
its
provides methods which can
also provides a set of quality
we have developed
volatility, timeliness,
a quality data object algebra
comparison methods and an algebra that extends the relational algebra
quality data object domain.
its
from a datum object and
compute quality dimension values including currency,
accuracy, consistency, and completeness. In addition,
that includes quality
It
datum
datum
to the
allows for a systematic construction of retrieval methods for quality
data objects.
The concept of quality data
manufacture of data products.
object presented in the
We envision
paper
is
a
first
that the quality data object
step toward the design and
proposed in
this
paper can be
used as basic building blocks for the design, manufacture, and delivery of quality data products.
enable users to measure the quality of data products according to their chosen
It
will
it
will also
enable users to purchase data products based on their quality requirements. In this manner,
we hope
that the concepts of quality data objects
and quality data products
data reusability.
19
will help
criteria;
improve data quality and
Appendix A
5.
Many
concepts in the object-oriented paradigm can be applied
They are fundamental
object.
approach.
In
this
in
our decision
to
model the quality data
Appendix we discuss how constructs
inheritance, method,
to
support the quality data
object via the object-oriented
in the object-oriented
polymorphism, active value, and message can be exploited
to
paradigm such
as
support the quality
data object.
We
present features of the object-oriented paradigm and relate them to the quality data object.
Modeling Paradigm
objects (Kim, 1989;
Kim,
paradigm,
In the object-oriented
1990). This
paradigm
is
conceptual entities are modeled as
all
particularly interesting to us because both data
quality can be represented as objects, as Figures 1-2 illustrate.
representation schemes for data and
Estimate
is
modeled as an
object
In Figure 2, for
It
its
eliminates the dichotomy of
example, the datum object Earnings-
its
quality.
its
quality description attributes such as Source-1
and
and
and Reporting-
Date are also modeled as objects.
Inheritance Objects in an object hierarchy can inherit both the data and methods from their
parent objects in the object hierarchy (Banerjee, 1987; Snyder, 1986; Zdonik
context of the quality data object,
quality information
just like data
Method The behavior
state of
an
Zdonik
object.
a quality data object
&
of
and methods
an object
Maier, 1990).
is
inherited
Maier, 1990).
by
its
In the
child object, the
Therefore, both quality indicators and quality
automatically inherited.
is
procedures can be reused
(Banerjee, 1987;
whenever
&
in the object-oriented
in the object-oriented
A method
paradigm
is
encapsulated in methods
consists of code that manipulates
In the context of the quality data object,
dimension values are procedure-oriented, and are
paradigm.
mechanisms used
difficult to
to
and returns the
determine data quality
express declaratively.
Therefore, the
procedural capability in the object-oriented paradigm can be used effectively to define quality
procedures in a quality data object.
For example, timeliness of a quality data object
As another example,
oriented and can be encapsulated as a method.
deleted,
and modified by the methods
(Wang, Reddy,
St
In the object-oriented
objects to define different procedures,
of
arguments (Zdonik
&
procedure-
since objects are instantiated,
of the object, the corresponding quality integrity constraints
Kon, 1992) can be embedded
Polymorphism
is
in the definition of these
methods.
paradigm, the same method name can be used in different
and the same method can take
Maier, 1990). This feature
is
different types or different
number
important in the context of the quality data object
because data quality measure methods can be defined differently in different objects with the same
20
name. Moreover, the evaluation of a procedure depends on the type and number of arguments passed
the procedure which, in turn,
depend on
the quality requirements of a user.
One
believability can be invoked with different sets of arguments:
Estimate
if
the
immediate source
(e.g.
the Wall Street Journal)
is
consider additional quality indicators such as source of source
Zacks Investment Research which
is
user
For example, the method
may
believe the Earnings-
credible whereas another user
(e.g.,
to
may
the Wall Street Journal quoted
considered very credible by the investment community) as
important in determining the believability. Using polymorphism, both of the users can use the same
method but with
different sets of arguments.
Active values In the object-oriented paradigm, the values of active instance variables are
computed
is
at
run time based on values of other instance variables (Zdonik
useful in computing data quality dimension values dynamically.
in the
eyes of the beholder (Wang, Kon,
object
need
to
&
&
Maier, 1990). This feature
Since data quality, in a sense,
lies
Madnick, 1993), some quality dimensions of a quality data
be computed dynamically based on
(1)
user requirements and
(2)
data and procedures
encapsulated in the quality data object. For example, timeliness of a quality data object can not be
stored as a value.
It
Messages
messages (Maier
&
must be computed dynamically upon demand, as discussed
In the object-oriented paradigm, objects can
Stein, 1987).
in Section 2.
communicate with one another through
Messages, together with any arguments that
messages, constitute the public interface of an object. This feature
is
handy
may be
passed with the
in the context of quality
data object because the extra complexity introduced in a quality data object can be encapsulated by the
interface of
an object which
is
nothing but a collection of messages.
21
References
6.
HI
|2|
Ballou, D P. & Pazer, H. L. (1987). Cost/Quality Tradeoffs for Control Procedures
Systems. International lournal of Management Science, ii(6), pp. 509-521.
Banerjee,
J.,
et al,. (1987).
Data Model Issues for Object-Oriented Applications.
in
Information
ACM
Transactions
on Office Information Systems. £(1).
|3]
[4|
Bernstein, P. A., et
al.
Chen,
Chen,
CA.
[71
Query Processing
(1981).
System
in a
in Distributed
Database Systems.
for Distributed Databases (SDD-1).
Transactions on Database Systems, 6(4), pp. 602-623.
P.
The Entity-Relationship Model
(1976).
P.
ACM/TODS, L
[61
N. (1981). Concurrency Control
11(2), pp. 185-221.
ACM
[5]
& Goodman,
Bernstein, P. A.
Computing Surveys,
-
Toward
Unified View of Data.
a
pp- 166-193.
An
P. P. (1984).
Algebra for a Directional Binary Entity- Relationship Model. Los Angeles,
1984. pp. 37-40.
Chen,
Entity -Relationsip Approach
P. (1991).
P.
to
Database Design
Wellesley,
.
MA:
Q.E.D.
Information Sciences.
[8]
Chen,
&
P. P.
The Lattice Concept
North-Holland.
Li, L. (1987).
Relationship Approach
[9]
Codd,
ACM,
[10]
Codd,
E. F. (1970).
A
relational
model of data
in Entity Set. In S.
Spaccapaietra (Ed.), Entity-
for large shared data banks.
Communications
of the
capture more meaning.
ACM
practical foundation for productivity, the 1981
ACM
11(6), pp. 377-387.
Extending the relational database model
E. F. (1979).
to
Transactions on Database Systems, i(4), pp. 397-434.
Ill)
Codd,
E. F. (1982).
Relational database:
Turing Award Lecture. Communications
[12]
Codd,
E. F. (1986).
An
The Second
relational.
A
ACM,
of the
25(2), pp. 109-117.
management systems that are claimed to be
on Data Engineering, Los Angeles, CA. 1986. pp.
evaluation scheme for database
International Conference
720-729.
[13]
Date, C.
J.
(1981). Referential Integrity.
The Proceedings
of the 7th International Conference on
Very Large Data bases, Cannes, France. 1981.
MA: Addison-Wesley.
[14]
Date, C.
J.
(1985).
An
Introduction to Database Systems
(15)
Date, C.
J.
(1990).
An
Introduction to Database Systems (5th ed.). Reading,
[16]
Denning, D.
[17]
Fernandez, E. B., Summers, R.
MA: Addison-Wesley.
[18]
Hoffman,
L.
E.
J.
&
Denning,
(1977).
P.
J.
(1979).
Data Security.
C, & Wood,
Modern Methods
for
C
.
Reading,
ACM
MA: Addison-Wesley.
Comp. Surv., U(3).
(1981). Database Security
and Integrity
Computer Security and Privacy
.
.
Englewood
Readings,
Cliffs, NJ:
Prentice Hall.
[19]
Hsiao, D. K., Kerr, D. S., & Madnick, S. M. (1978). Privacy and Security of Data Communications
and Data bases. Proceedings of the 4th International Conference on Very Large Data Bases, 1978.
[20]
Huh,
[21]
Johnson,
Y. U., et
J.
al.
(1990).
Data Quality. Information and Softvmre Technology, 32(8), pp. 559-565.
R., Leitch, R. A.,
&
Neter,
J.
(1981). Characteristics of Errors in
Accounts Receivable and
Inventory Audits. Accounting Review, 5JH April), pp. 270-293.
[22]
Kazmier,
L.
J.
(1976). Business Statistics
.
New
York, N.Y.:
22
McGraw
Hill
Company.
[23
Khoshafian, S. N. & Copeland, G. P. (1990). Object Identity. In
San Mateo, CA: Morgan Kaufmann.
S. B.
Zdonik&
D. Maier (Ed.), (pp.
37-46).
[24
[25
Kim, W., Lochovsky, Frederick H. (1989). Object-Oriented Concepts, Databases, and Applications
(1 ed.). New York: Addison-Wesley.
Kim, W.
(1990).
Databases: Definition and Research Directions.
Object-Oriented
IEEE
Transactions on Knowledge and Data Engineering, 2(3).
[26
[27]
Kim, W., Bertino, E., & Garza, J. F. (1989). Proceedings of the 1989 ACM SIGMOD Internaltional
Conference on the Management of Data. ACM SIGMOD, Portland, Oregan. 1989. pp. 337-347.
&
Korth, H.
Silberschatz, A. (1986). Database System Concepts
New
.
York: McGraw-Hill Book
Company.
[28
Laudon, K. C. (1986). Data Quality and Due Process in Large Interorganizational Record Systems.
Communications of the ACM, 220), pp. 4-11.
[29
Liepins, G. E.
& Uppuluri, V. R. R. (1990). Data Quality Control: Theory and Pragmatics (pp. 360).
Marcel Dekker, Inc.
New York:
[30
Liepins, O. E. (1989).
Sound Data Are a Sound Investment. Quality Programs, (September), pp.
61-
63.
[31
&
Maier, D.
Cambridge,
[32
Martin,
Stein,
(1987).
J.
Development and Implentation of an Object-Oriented
(1973). Security, Accuracy,
J.
DBMS
.
MA: MIT Press.
and Privacy
in
Computer Systems
.
Englewood
Cliffs, NJ:
Prentice Hall.
[33
Maxwell,
B. S. (1989).
Beyond "Data
Validity":
Improving the Quality of HRIS Data. Personnel,
,
pp. 48-58.
[34
McCarthy,
L.
J.
(1982). Metadata
Management
for
Large Statistical Databases. Mexico City,
Mexico. 1982. pp. 234-243.
[35
McCarthy,
L. (1984). Scientific Information
J.
School, Monterey,
CA.
= Data + Meta-data. U.S. Naval Postgraduate
1984.
A New Tool for Scientific Information.
J. L. (1988). The Automated Data Thesaurus:
Proceedings of the 11th International Codata Conference, Karlsruhe, Germany. 1988.
[36
McCarthy,
[37]
Qian, X.
&
Wiederhold, G. (1986). Knowledge-based integrity constraint
on Very Large Data Bases, 1986. pp. 3-12.
validation.
The Twelfth
International Conference
[38]
Redman,
[39]
Sciore, E. (1991).
[40
Siegel,
T. C. (1992).
Data Quality: Management and Technology
.
New
York, NY: Bantam Books.
Using Annotations to Support Multiple Kinds of Versioning in an Object-Oriented
Database System. ACM Transactions on Database Systems, l£(No. 3, September 1991), pp. 417-438.
M.
&
Madnick,
S.
E. (1991).
A
metadata approach
to
resolving semantic
conflicts.
Barcelona, Spain. 1991.
[41
Snyder, A. (1986). Encapsulation and Inheritance
OOPSLA-86,
[42
Ullman,
J.
in
Object-Oriented Programming Languages.
1986. pp. 38-45.
D. (1982). Principles of Database Systems
.
Rockville, Maryland,
USA: Computer
Science Press.
[43
&
Towards Total Data Quality Management (TDQM). In R. Y.
and Perspectives Englewood Cliffs, NJ:
Wang,
R. Y.
Wang
(Ed.), Information Technology in Action: Trends
Kon, H.
B. (1992).
Prentice Hall.
23
[44|
[45]
Wang, R. Y., Kon, H. B., k Madnick, S. E. (1993). Data Quality Requirements Analysis and
Modeling in Data Engineering, in the Proceedings of the 9th International Conference on Data
Engingeenng, Vienna. 1993.
Wang, R. Y., Reddy, M. P., & Kon, H. B. (1992). Toward Quality Data: An Attribute-based
Approach, to appear in the journal of Decision Support Systems (DSS), Special Issue on
Information Technologies and Systems,.
Y. R. & Madnick, S. E. (1990). A Polygen Model for Heterogeneous Database Systems: The
Source Tagging Perspective. Brisbane, Australia. 1990. pp. 519-538.
[46]
Wang,
[47]
Wang,
&
Y. R.
Strong, D. (1992). Dimensions of Data Quality:
Beyond Accuracy. Submitted
for
Publication.
[481
Zarkovich. (1966). Quality of
United Nations.
[49]
Zdonik,
S.
California:
B.
&
Data
Maier, D. (1990). Readings
Morgan Kaufmann
(I
Statistical
A
.
Rome: Food and Agriculture Organization
in
Publishers, Inc.
24
~t
Object Olriented Database Systems (pp.
of the
629).
Date Due
Lib-26-67
MIT LIBRARIES
3
TOflO
00flMb55E
5
Download