recife2

advertisement
Normal Symbolic Form
Marc CSERNEL
Paris IX Dauphine
Inria(Axis)
Supported by the French embassy in Brazil with the help of
the cultural and technical services in RECIFE
Recife Nov 2004
Normal Symbolic Form
1
outlines
•
•
•
•
•
•
•
•
•
Symbolic description with rules
Comparison of S.D
Notion of coherence
Description Potential
Normal Symbolic Form (NSF)
Computation using N.S.F.
Problem of memory growth
The effective Growth
Perspectives et Conclusion
Recife Nov 2004
Normal Symbolic Form
2
Symbolic descriptions
Classical
individuals
Symbolic
Descriptions
•
•
•
•
•
Color
Size
Beatles 1
Blue
8.0
Beatles 2
Red
10.0
color
size
specie1
{Blue,Red}
[7:12]
specie2
{Yellow}
[8:9]
S.D are well adapted to describes species, classes, collections .
S.D. allows to take into account variability.,uncertainty….
A Symbolic Description represent an intention
Usually the extension of a S.D. is made from usual individual
S.D are a good instrument to summarize information
Recife Nov 2004
Normal Symbolic Form
3
Background Knowledge
Background Knowledge can be introduced as rules
• Two kinds of rules
– hierarchical (mother-daughter)
if wings  {absent} then wings_colour = NA
– logical:
if wings_colour  {red} then Thorax_colour  {blue}
• Dependencies reduce the description space
– they introduces holes in the description space
• But hierarchical dependencies reduce also the number of
dimensions.
Recife Nov 2004
Normal Symbolic Form
4
NA semantic
• NA means that the variable is not applicable, (or
non-existent or meaningless) if the premise is true.
• NA should not be considered as a value, but for
conveniences we denote: var = NA.
– remark 1: hierarchical rules induce a kind of
inheritance.
– remark 2: As NA is not a possible value of a variable,
if a variable is Not Applicable, it can’t have any value.
Recife Nov 2004
Normal Symbolic Form
5
The Importance of the rules
Similarity problem
if we consider two symbolic descriptions:
x = [a {a1, a2, a3, a4}] ^ [b {b1, b2, b3, b4}]
y = [a {a3, a4}] ^ [b {b2, b3, b4, b5}]
• if a  {a1,a2} then b = NA
a4
a4
a3
a3
a2
a2
a1
a1
b1 b2 b3 b4 b5
b1 b2 b3 b4 b5
the objects look more similar
Recife Nov 2004
Normal Symbolic Form
6
Problem of Identification
If we considers two objects
x = [a { a3, a4}] ^ [b { b2, b3 }]
y = [a {a2, a3}] ^ [b { b3, b4 }]
x full white
y grey shade
if a  {a3} then b  {b1,b2, b4 }
a4
a3
a2
a1
a4
a3
a2
a1
b1 b2 b3 b4
Recife Nov 2004
b1 b2 b3 b4
These objects cannot always be discriminated
Normal Symbolic Form
7
Our needs
•
•
•
•
Be able to compare S.D. contrained by rules
Make then, data analysis, data mining…..,
Specially distance computation
d(a,b)=  (ab)0.5( (a) (b))
 (ab)
• where  is the join operator
• (a) the description potential
a
b
of a
Recife Nov 2004
Normal Symbolic Form
8
The Coherence
• An individual is coherent if it's description respect the rules.
• The coherent part of an S.D. is the part of the description which
respect the rules.
• An S.D. is coherent if a coherent part exist.
• An S.D. is fully coherent when all the H-volume it describe is
coherent.
– If wing {Absent} then wings_colour = NA
wings
Wings-color
d1
{absent}
{blue,yellow,red}
d2
{absent,present}
{blue,yellow,red}
d3
{present}
{blue,yellow,red}
• d1 is not coherent, d2 is coherent, d3 is fully coherent
Recife Nov 2004
Normal Symbolic Form
9
The Description Potential
• Description Potential: the measure of the COHERENT part of the
volume described by a symbolic description.
• The introduction of dependence rules changes the potential of an
description.
• The Computation of the Description Potential is combinatorial.
r1
r2
D
Recife Nov 2004
Normal Symbolic Form
10
combinatorial aspect of D.P
x : {a1,a2}{b1,b2}{c1,c2}{d1,d2}
Potential without rules = 2x2x2x2 = 16
if a  {a1} then b  { b1 } ;(r1) if c  {c1 } then d  { d1 } ;(r2)
a1 b1 c1 d1
a1 b1 c1 d2
a1 b1 c2 d1
a1 b1 c2 d2
a1 b2 c1 d1
a1 b2 c1 d2
a1 b2 c2 d2
a1 b2 c2 d2
Recife Nov 2004
Y
N(r2)
Y
Y
N (r1)
N(r1,r2)
N(r1)
N (r1)
a2 b1 c1 d1
a2 b1 c1 d2
a2 b1 c2 d1
a2 b1 c2 d2
a2 b2 c1 d1
a2 b2 c1 d2
a2 b2 c2 d1
a2 b2 c2 d2
Normal Symbolic Form
Y
N(r2)
Y
Y
Y
N(r2)
Y
Y
11
computation of D.P.
without dependencies
p
 a   Ai 
i1
  Ai



 

 


cardinal A , if y is discrete
i
 i
Range A , if y is continu
i
 i
where Range(Ai) is the absolute value of
the difference between the upper bound
and the lower bound of interval Ai.
Recife Nov 2004
Normal Symbolic Form
12
computation of D.P.
with dependencies
d =  [yi  Ai] be a S.D. and rj  {r1, , rt} a rule
t
  (d rj)
j1
   ((d  r j )  r ) 
k
jk
p
 (d / r  rt )    (Di )
1
i1
(1)t 1 ((d  r )  r )  rt )
1
2
complexity:- Exponential according to the number of rules
- Linear according to the number of variables
Recife Nov 2004
Normal Symbolic Form
13
Our Aim
• Our aim is to represent only the valid part of a
S.D. (fully coherent)
– We have to split the description space in different
subspaces.
– Each subspace will correspond to one premise
variable, and all conclusion Variables linked with.
– In subspace will be cut in different slices where all
the values of the premise variable lead to the same
conclusion
Recife Nov 2004
Normal Symbolic Form
14
Normal Symbolic Form
• If Hand {Absent} then Hand_Size= N.A.
• If Hand {Absent} then Finger = N.A.
• If Finger  {Absent} then Finger_size  N.A.
d1
d2
hand
{absent, present}
{absent, present}
hand_size
{big,midle}
{big,small}
finger
finger_size …
{absent, present} {big, small} …
.
{ present}
{small}
Hand
Finger
Hand_size
Finger_Size
Recife Nov 2004
Normal Symbolic Form
15
N.S.F.
Hand
Hand_size
d1 {absent, present} {big,midlle}
d2 {absent, present} {big,small}
d1
d2
Main table
N°
1
2
3
Hand
Hand_size
{present } {big,middle}
{present} {big,small}
{absent}
N.A.
Hand
{ 1,3}
{2,3}
finger
{absent, present}
{ present}
..
….
….
finger_size …
{big, small} …
.
{small}
Secondary tables
finger_table
{1,3}
{2}
N.A.
N°
1
2
3
finger
{ present}
{present}
{absent}
finger_size
{big,small}
{ small}
{NA}
3 tables but NO MORE RULES
Recife Nov 2004
Normal Symbolic Form
16
Normal Symbolic Form
• First NSF condition:
– If no dependency occurs between the variables, or if a dependency
occurs between the first variable V1 and the others
• Second NSF condition
– If all values expressed for one object by the premise variable V1
leads to the same conclusion
• Valid Only if the rules form a tree or a set of tree (first
condition)
• inspired by Codd's Normal Form in Databases
Recife Nov 2004
Normal Symbolic Form
17
Consequences
Two consequences:
• Cut the data tables into different tables according to the
dependence tree
– possible only if the dependence form a tree or a forest
• Cut each symbolic description in two parts (CQ2)
– One where the premise is true
– One where the premise is false
If Finger {Absent} then Finger_Size  N.A.
Finger
Finger_Size
{absent,present}
{big,small}
Recife Nov 2004
Normal Symbolic Form
finger
Finger_size
{present}
{big,small}
{absent}
N.A.
18
The dependence tree
• For each rule we draw an edge from the
premise variable to the conclusion
variable
• Each node correspond to one variable
• Each node can be linked to more than a
rule
• Each node and his son correspond to a
secondary table
• Y1 form a table with Y2,Y3,Y4
• Y2 form a table with Y5 and Y6
Recife Nov 2004
Normal Symbolic Form
y1
y2
y5
y3
y4
y6
19
Potential computation with N.S.F
(d1) = (ta(1) +ta(3))*…= (6+1)*….
ta(1) = 1*2*(tc(1) +tc(3)) = 1*2*(1 +2) = 6
ta(3) = 1*1 = 1
(a1) = 2 * (ta(1) +ta(2)) = 2*(3+2) = 10
7*..
d1
d2
Hand
{ 1,3}
{2,3}
N°
Hand
Hand_Size Finger_Table
6 1 {present } {big,med}
{1,3}
2 {present} {big,small}
{2}
{absent}
N.A.
N.A.
1 3
ta
Recife Nov 2004
..
….
….
2
1
N°
Finger
1 { present}
2 {present}
3
{absent}
Finger_size
{big,small}
{ small}
{NA}
tc
Normal Symbolic Form
20
Memory Growth
(1st approach)
• CQ2=> size double for each node of the dependence tree
N
T: nb of premise variables, N: nb of descriptions, S: size of the biggest
secondary table
• S = < N*2D (D depth of the tree)
• tree well balanced then D = Log2 (T)
N*22
N*23
S= N*2 Log2 (T) = N*T => POLYNOMIAL
N*24
• tree not balanced (worst case) then S = N*2T
N*25
Recife Nov 2004
Normal Symbolic Form
N*2T
21
Data Factorisation
wings
d1 {absent, present}
d2 {absent, present}
d3
{absent}
d1
d2
d3
wings
Color
1
2
{pres }
{abs}
{1,2}
{4}
3
{pres}
{1,3}
wings_color
Thor_col
Thorax_size
{red, blue} {blue, yellow} {big, small}
{red, green}
{blue, red}
{small}
NA
{ blue,yellow } {med,small}
Wings...
Thorax_Size
{ 1, 2}
{2,3}
{2}
{big, small}
{small}
{med,small}
color
Wings_co
Thorax_Col
1
2
3
4
{ red}
{blue}
{green}
NA
{blue}
{ blue, yell}
{blue,red}
{ blue, yell }
if wings  {Absent} then wings_colour = NA.
if wings_colour {red} then Thorax_colour {blue}.
Recife Nov 2004
Normal Symbolic Form
22
usual operations with N.S.F.
•
•
no changes BUT recursion due to numerous
tables (following table's tree)
Two kinds of operation
–
–
Creating a new Volume (join, union..)
Restriction of a existing volume (intersection..)
Recife Nov 2004
Normal Symbolic Form
23
The Join (without N.S.F.)
An Operation Creating a new volume
Recife Nov 2004
Normal Symbolic Form
24
operations with N.S.F.
(creating a new volume)
OK
Recife Nov 2004
Suicide ??
Normal Symbolic Form
25
The Bounds
We will first consider nominal variables and hierarchical
rules.
2 Cases:
- Locally : between mother and daughter
- Globally : between the root and a leaf
Sn: size according to CQ2
Sv : size according the number of
possible different descriptions
Ndaughter, Nmother, Fl = Nd/Nm
Flocal 
N daughter
N mother
Recife Nov 2004
min S n , Sv 

2
N mother
Normal Symbolic Form
26
One set of premise Value
• One premise variable y divided in two set of values
A and A
• Locally
– m conclusion variables x1…xm
– Nmother,Ndaugther : tables sizes
– according to CQ2:
Sn = 2* Nm
– according to the variables domain


m
Xj



Sv  2  1   2  1  2  1


 j 1 
A
A
– Flocal <= 2
Recife Nov 2004
Normal Symbolic Form
27
Globally
-Line with NA can not refer
to secondary tables
N
A
A
-Fglobal <= 2
-There is no following
element for A
-No global growing
Recife Nov 2004
Normal Symbolic Form
A1
A1
N
N
N
N
28
More than 1set of premise values
• Until now 1 set of premise value
• But more than one (n) can exist
if travel  {plane} then car = N.A.
if travel  {plane} then train = N.A.
if travel  {car} then plane = N.A ….
N

F N
D
l
Recife Nov 2004
M

min T n,T d 
N
n
M
Normal Symbolic Form
29
Tables Occupation
N Items to
Put
I31,I37,I70
I10,I20
m possible
descriptions
(places)
I15,I27
• how many busy places ?
• Statistic Maxwell Boltzman in physic
Recife Nov 2004
Normal Symbolic Form
30
Tables Occupation (without rules)
N initial Individuals, m possible descriptions
if d1, , dj, , dpvariables
m = P(X1) -   P(Xj) -   P(Xp) - 
• With independent variables and equirepatition
p
p ( d1 ,...d n )  
i 1
1

P ( Xi)  
1
 2
p
i 1
Recife Nov 2004
Normal Symbolic Form
Xi

1
31
Tables occupation with rules
• 2 tables are generated
– one where the rules did NOT apply
– one where the rules did apply
(T1)
(T2)
• For T2 preceding problem
• T1 size :
– small compare with N=> T1 mostly full
– => factorisation
• the size will be growing according to
• greater
Recife Nov 2004
P( A)
P(A)
A is greater the factorisation will be
Normal Symbolic Form
32
Application
• Phlebotomies (Shadflies)
– 73 species (descriptions)
– 53 nominal variables
– 5 rules in 3 different trees with 8 variables
• Fg = Fl
• Secondary tables
– 32 lines (56)
– 18 lines (39)
– 16 lines (30)
Recife Nov 2004
 Fl = 32/73 = 0.438
 Fl = 18/73 = 0.246
 Fl = 16/73 = 0.219
Normal Symbolic Form
33
About complexity
•
Size
–
–
•
Over cost with references variable
Over cost induced by the possible growing factor F
Computation
–
–
–
–
the related to the rules disappeared
On over cost due to the NSF transformation
appears (N2 like a sort) but only Once
A minor over cost (linear) appears with some
operation (like the join)
Over cost due to the recursion
Recife Nov 2004
Normal Symbolic Form
34
perspectives and conclusions
• Our work arrive to a Mature point
• But still uncompleted
– Accept dependence graph instead of
dependence tree
– Adapt algorithm to N.S.F
– First distances and comparisons
– Clustering, factorial analysis …..
• Made simulation studies
Recife Nov 2004
Normal Symbolic Form
35
Download