On Rules Discovery from Incomplete Information Systems

advertisement
On Rules Discovery from Incomplete Information Systems
Agnieszka Dardzinska*,+ and Zbigniew W. Ras+,#
*)
Bialystok Technical Univ., Dept. of Mathematics, ul. Wiejska 45A, 15-351, Bialystok, Poland
+)
UNC-Charlotte, Department of Computer Science, Charlotte, N.C. 28223
#)
Polish Academy of Sciences, Institute of Computer Science, Ordona 21, Warsaw, Poland
adardzin@uncc.edu and ras@uncc.edu
Abstract: In this paper we present a new rule-discovery method for incomplete
Information Systems (IS) which are generalizations of information systems introduced
by Pawlak [4]. The proposed method has some similarities with system LERS [1]. It is a
bottom-up strategy, generating sets of weighted objects having first only one-value
properties. Some pairs of these generated sets are used for constructing rules and they
are marked as successful. All pairs having a number of supporting objects below some
threshold value are marked as well. All remaining (unmarked) pairs are used to
construct new sets of weighted objects having two-value properties. Some of them are in
a supporting relationship with sets of weighted objects having one-value properties. By
supporting relationship we mean a relationship identifying a rule which can be extracted
from our incomplete IS. This process is continued recursively by moving to sets of
weighted objects having k-value properties, where k2. Our strategy can also be used
for chasing incomplete information systems with rules discovered from them. This way,
our system can be seen as a new tool for handling incompleteness in IS.
1. Introduction
There is a number of strategies which allow us to find rules describing decision attributes
in terms of classification attributes. For instance, we can mention here such systems like
LERS [1], AQ18, Rosetta [3] and, C4.5 [5]. In spite of the fact that the number of rule
discovery methods is still increasing, most of them are developed under the assumption
that the information about objects in the information system is either precisely known or
not known at all. This implies that either one value of an attribute is assigned to an object
as its property or no value is assigned to it (instead of no value we use the term null
value).
Problem of inducing rules from systems with incomplete attribute values represented as
sets of possible values was discussed for instance by Kryszkiewicz and Rybinski [2] and
by Ras and Joshi [6]. In this paper, we present a new strategy for discovering rules in
incomplete information systems. The constraints placed on what values of attributes can
be assigned to objects are as weak as possible. Quite often user’s knowledge about
objects is not exact and even asking the user to assign a set of possible attribute values to
an object is a property which is still too strong to be forced as a constraint. In our system,
we allow to use not only sets of attribute values as values of an object but also we allow
to assign a confidence to each of this attribute values. For instance, by assigning a value
{(brown, 1/3), (black, 2/3)} of the attribute Color of Hair to an object x we say that the
confidence in object x having brown hair is 1/3 whereas the confidence that x has black
hair is 2/3.
2. Discovering Rules in Incomplete IS
We start with a definition of an incomplete information system which is a generalization
of an information system introduced by Pawlak [3].
By an incomplete Information System (IS) we mean S = (X, A, V), where X is a finite
set of objects, A is a finite set of attributes and V = {Va : a  A} is a finite set of
values of attributes. The set Va is a domain of attribute a. We assume that for each
attribute a  A and x  X, a(x) ={(ai, pi): i  Ja(x)  (i  Ja(x)) [ai  Va]  pi = 1}.
Null value assigned to an object is interpreted as all possible values of an attribute with
equal confidence assigned to all of them. Figure 1 gives an example of an incomplete
information system S = ({x1, x2, x3, x4, x5, x6, x7, x8}, {a, b, c, d, e}, V}.
Clearly, a(x1) = {(a1,1/3), (a2,2/3)}, a(x2) = {(a2,1/4), (a3,3/4)}, …..
Let us try to extract rules from S describing attribute e in terms of attributes {a, b, c, d}
following a strategy similar to LERS [1].
So, we start with identifying sets of objects in X having properties a1, a2, a3, b1, b2, c1, c2,
c3, d1, d2 and next we find their relationships with sets of objects in X having properties
e1, e2, e3. Let us start with a1* which is equal to {(x1,1/3), (x3,1), (x5, 2/3)}. The
justification of this is quite simple. Only these 3 objects may have that property. Object
x3 has property 1 for sure. The confidence that x1 has property a1 is 1/3 since (a1,1/3) 
a(x1). In a similar way we justify the property a1 for object x5.
X
x1
x2
x3
x4
x5
x6
x7
x8
a
(a1,1/3), (a2,2/3)
(a2,1/4), (a3,3/4)
a1
a3
(a1,2/3), (a2,1/3)
a2
a2
a3
b
c
(b1,2/3), (b2,1/3) c1
(b1,1/3), (b2,2/3)
b2
(c1,1/2), (c3,1/2)
c2
b1
c2
b2
c3
(b1,1/4), (b2,3/4) (c1,1/3), (c2,2/3)
b2
c1
Figure 1.
So, as far as values of classification attributes, we get :
a1*  {( x1 , 13 ), ( x3 ,1), ( x5 , 23 )} ,
a2 *  {( x1 , 23 ), ( x2 , 14 ), ( x5 , 13 ), ( x6 ,1), ( x7 ,1)} ,
a3 *  {( x2 , 43 ), ( x4 ,1), ( x8 ,1)} ,
b1*  {( x1 , 23 ), ( x2 , 13 ), ( x4 , 12 ), ( x5 ,1), ( x7 , 14 )} ,
b1*  {( x1 , 13 ), ( x2 , 23 ), ( x3 ,1), ( x4 , 12 ), ( x6 ,1), ( x7 , 43 ), ( x8 ,1)} ,
d
d1
d2
d2
d1
d2
d2
d1
e
(e1,1/2), (e2,1/2)
e1
e3
(e1,2/3), (e2,1/3)
e1
(e2,1/3), (e3,2/3)
e2
e3
c1*  {( x1 ,1), ( x2 , 13 ), ( x3 , 12 ), ( x7 , 13 ), ( x8 ,1)} ,
c2 *  {( x2 , 13 ), ( x4 ,1), ( x5 ,1), ( x7 , 23 )} ,
c3 *  {( x2 , 13 ), ( x3 , 12 ), ( x6 ,1)} ,
d1*  {( x1 ,1), ( x4 ,1), ( x5 , 12 ), ( x8 ,1)} ,
d 2 *  {( x2 ,1), ( x3 ,1), ( x5 , 12 ), ( x6 ,1), ( x7 ,1)} .
For the values of decision attribute we get:
e1 *  {( x1 , 12 ), ( x2 ,1), ( x4 , 23 ), ( x5 ,1)} ,
e2 *  {( x1 , 12 ), ( x4 , 13 ), ( x6 , 13 ), ( x7 ,1)} ,
e3 *  {( x3 ,1), ( x6 , 23 ), ( x8 ,1)} .
The next step is to propose a method for checking a relationship between classification
attributes and the decision attribute. For any two sets ci *  {( xi , pi )}iN ,
e j *  {( y j , q j )} jM , where pi  0 and q j  0 , we propose that:
{( xi , pi )}iN  {( y j , q j )} jM iff the support of the rule [ ci  e j ] is above some
threshold value.
So, the relationship between ci * , e j * depends only on how high is the support of the
corresponding rule [ ci  e j ]. Coming back to our example, let us notice that we may
have a successful set-theoretical inclusion between objects from a1 * and objects from
e3 * but the set-theoretical inclusion between objects from a1 * and objects from e1 *
may fail. The justification of this claim is the following:
Object x3 is a member of a1 * but it does not belong to e1 * . At the same time, the
other set-theoretical inclusion is feasible because it is not certain that x1 and x5 belong to
a1 * . Assuming that the relationship a1 *  e3 * is successful, what can we say about the
support and confidence of the rule a1  e3 ?
Object x1 supports a1 with a confidence 13 and it supports e3 with a confidence 0.
Object x3 supports a1 with a confidence 1 and it supports e3 with a confidence 1.
Object x5 supports a1 with a confidence 23 and it supports e3 with a confidence 0.
We calculate the support of the rule [ a1  e3 ] by calculating:
1
3
 0  1  1  23  0  1 .
Similarly, the support of the term a1 is 13  1  23  2 . So, the confidence of the above
rule will be 12 , if we follow the standard strategy for calculating the confidence of a rule.
Now, let us follow a strategy which has some similarity with LERS. All inclusions [  ]
will be automatically marked assuming that the confidence and support of the
corresponding rules are both above some threshold values (which are given by user). In
the current example we take 12 as the threshold for the confidence and 1 as the threshold
for support.
Assume again that
ci *  {( xi , pi )}iN , e j *  {( y j , q j )} jM . The algorithm will check
first the support of the rule ci  e j . If support is below a threshold value, then the
corresponding relationship {( xi , pi )}iN  {( y j , q j )} jM does not hold and it is not
considered in later steps. Otherwise, the confidence of the rule ci  e j is checked. If that
confidence is either above or equal the assumed threshold value, the rule is approved and
the corresponding relationship {( xi , pi )}iN  {( y j , q j )} jM is marked. Otherwise this
corresponding relationship remains unmarked.
Now, let us follow the strategy similar to LERS. All feasible inclusions are automatically
marked assuming that their confidence is above the threshold (given by user). In the
current example we take Msup=1 as the threshold value for support and Mconf=0.5 as
the threshold value for the confidence.
a1*  e1 * ( sup  65  1 ) – marked negative
a1*  e2 * ( sup  16  1 ) – marked negative
a1*  e3 * ( sup  1  1 and conf  0.5 ) – marked positive
11
 1 ) – marked negative
a2 *  e1 * ( sup  12
a2 *  e2 * ( sup  53  1 and conf  0.51 ) – marked positive
a2 *  e3 * ( sup  23  1 ) – marked negative
a3 *  e1 * ( sup  17
12  1 and conf  0.51 ) – marked positive
a3 *  e2 * ( sup  13  1 ) – marked negative
a3 *  e3 * ( sup  1  1 but conf  0.36 ) – not marked
b1*  e1 * ( sup  2  1 and conf  0.72 ) – marked positive
b1*  e2 * ( sup  43  1 ) – marked negative
b1 *  e3 * ( sup  0  1 ) – marked negative
b2 *  e1 * ( sup  76  1 but conf  0.22 ) – not marked
b2 *  e2 * ( sup  17
12  1 but conf  0.27 ) – not marked
b2 *  e3 * ( sup  83  1 and conf  0.51 ) – marked positive
c1*  e1 * ( sup  65  1 ) – marked negative
c1*  e2 * ( sup  65  1 ) – marked negative
c1*  e3 * ( sup  23  1 but conf  0.47 ) – not marked
c2 *  e1 * ( sup  2  1 and conf  0.66 ) – marked positive
c2 *  e2 * ( sup  1  1 but conf  0.33 ) – not marked
c2 *  e3 * ( sup  0  1 ) – marked negative
c3 *  e1 * ( sup  13  1 ) – marked negative
c3 *  e2 * ( sup  13  1 ) – marked negative
c3*  e3 * ( sup  76  1 and conf  0.64 ) – marked positive
d1*  e1 * ( sup  53  1 but conf  0.48 ) – not marked
d1*  e2 * ( sup  65  1 ) – marked negative
d1*  e3 * ( sup  1  1 but conf  0.28 ) – not marked
d 2 *  e1 * ( sup  23  1 but conf  0.33 ) – not marked
d 2 *  e2 * ( sup  13  1 ) – marked negative
d 2 *  e3 * ( sup  53  1 but conf  0.37 ) – not marked
The next step is to build terms of length 2 from objects with support  1 , and confidence
 0.5 . We propose the following definition for concatenating any two sets
ci *  {( xi , pi )}iN , e j *  {( y j , q j )} jM where K  M  N : ( ci  e j )*  {( xi , pi  qi )}iK .
Following this definition, we get:
(a3  c1 )*  e3 * ( sup  1  1 and conf  0.8 ) – marked positive
(a3  d1 )*  e3 * ( sup  1  1 and conf  0.5 ) – marked positive
(a3  d 2 )*  e3 * ( sup  0  1 ) – marked negative
(b2  d 2 )*  e1 * ( sup  23  1 ) – marked negative
(b2  c2 )*  e2 * ( sup  23  1 ) – marked negative
(c1  d1 )*  e3 * ( sup  1  1 and conf  0.5 ) – marked positive
(c1  d 2 )*  e3 * ( sup  12  1 ) – marked negative
3. Algorithm ERID (Extracting Rules from Incomplete Decision
Systems)
We are ready to present the algorithm for extracting rules from an incomplete
information system. To simplify the presentation, a new notations (only for the purpose
of this algorithm) will be introduced. So, let us assume that S  ( X , A  {d },V ) is a
decision system, where X is a set of objects, A  {a[i ] : 1  i  I } is a set of classification
attributes, Vi  {a[i, j ] : 1  j  J (i )} is a set of values of attribute a[i ], i  I . We also
assume that d is a decision attribute where Vd  {d [m] : 1  m  M } . Finally, we assume
that a(i1 , j1 )  a(i2 , j2 )  ...  a(ir , jr ) is denoted by term [a (ik , jk )]k{1, 2,..., r} , where all
i1 , i2 ,..., ir are distinct integers and j p  J (i p ) , 1  p  r .
Algorithm for Extracting Rules from Incomplete Decision System S.
Algorithm ERID ( S , 1 , 2 )
S – incomplete decision system,
1 – minimum support threshold,
2 – minimum confidence threshold.
i : 1
while i  I do
begin
j : 1 ; m : 1
while j  J [i} do
begin
if sup(a[i, j ]  d (m))  1 then mark[( a[i, j ]*  d ( m)*)]  negative ;
if sup(a[i, j ]  d (m))  1 and conf (a[i, j ]  d (m))  2
then begin mark[( a[i, j ]*  d ( m)*)]  positive ;
output[a[i, j ]  d ( m )]
end
j : j  1
end
end
I k ;  {ik } ; / ik – index randomly chosen from {1,2,..., I } /
for all jk  J (ik ) do a[(ik , jk )]ik I k : a (ik , jk ) ;
for all i, j such that both rules
a[( ik , jk )]ik I k  d ( m ) ,
a[i, j ]  d ( m ) are not marked and i  I k do
begin
if sup ( a[(ik , jk )]ik I k  a[i, j ]  d ( m))  1
then mark[a[(ik , jk )]ik I k  a[i, j ]*  d ( m)*]  negative ;
if
sup ( a[( ik , jk )]ik I k  a[i, j ]  d ( m))  1 and
conf ( a[(ik , jk )]ik I k  a[i, j ]  d ( m))  2
then
begin
mark [a[( ik , jk )]ik I k  a[i, j ]*  d ( m)*]  positive ;
output[a[(ik , jk )]ik I k  a[i, j ]  d ( m)]
else
end
begin
I k : I k  {i} ;
a[(ik , jk )]ik I k : a[(ik , jk )]ik I k  a[i, j ]
end
The complexity of this algorithm is similar to complexity of LERS.
4. Conclusion
Missing data values cause problems both in knowledge extraction from information
systems and in query answering systems. Quite often missing data are partially specified.
It means that the missing value is an intermediate value between the unknown value and
the completely specified value. Clearly there is a number of intermediate values possible
for a given attribute. Algorithm ERID presented in this paper can be used as a new tool
for chasing values in incomplete information systems with rules discovered from them.
We can keep repeating this strategy till a fix point is reached. To our knowledge, there
are no such tools available for incomplete information systems presented in this paper.
5. References
[1] Grzymala-Busse, J. “A new version of the rule induction system LERS”, Fundamenta
Informaticae, Vol. 31, No. 1, 1997, 27-39
[2] Kryszkiewicz, M., Rybinski, H., “Reducing information systems with uncertain
attributes}, Proceedings of ISMIS'96, LNCS/LNAI, Springer-Verlag, Vol. 1079,
1996, 285-294
[3] Ohrn, A., Komorowski, J., Skowron, A., Synak, P., “A software systemfor rough data
analysis”, Bulletin of the International Rough Sets Society, Vol. 1 , No. 2, 1997, 5859
[4] Pawlak, Z., “Rough sets-theoretical aspects of reasoning about data”, Kluwer,
Dordrecht, 1991
[5] Quinlan, J.R., “C4.5: Programs for Machine Learning”, Morgan Kaufmann
Publishers, San Mateo, CA, 1993
[6] Ras, Z.W., Joshi, S., Ras, Z., Joshi, S., “Query approximate answering system for an
incomplete DKBS", in Fundamenta Informaticae Journal, IOS Press, Vol. 30, No.
3/4, 1997, 313-324
Download