Ontologie GLB

advertisement
Department of Instrumentation and Control Engineering,
Faculty of Mechanical Engineering, CTU in Prague,
Technická 4, 166 07 Prague 6
The Application of
Data Mining Methods In
Monitoring of Ecosystems
Jiri BILA and Jakub JURA
Jiri.Bila@fs.cvut.cz
Jakub.Jura@fs.cvut.cz
Monitoring of Ecosystems
• 11 Measuring Stations
• 13 variables
• Sampling period 6 minutes
Database system for
monitoring
Měřicí stanice Domanín
%
°C
%
°C
W/m2 W/m2 %
°C
°C
2.12.2007
2.12.2007
2.12.2007
2.12.2007
2.12.2007
2.12.2007
2.12.2007
2.12.2007
2.12.2007
2.12.2007
2.12.2007
2.12.2007
2.12.2007
2.12.2007
2.12.2007
2.12.2007
2.12.2007
2.12.2007
2.12.2007
2.12.2007
2.12.2007
25;97
2;2
14;87
36;69
26;63
58;65
64;71
4;77
35;77
64;78
96;65
11;67
75;72
8;72
14;67
36;65
5;66
44;75
46;75
91;68
46;70
28;0
13;20
42;3
57;7
33;9
94;9
3;7
81;6
94;6
8;6
78;10
03;10
31;9
7;8
83;10
16;10
25;10
82;7
37;7
22;9
71;9
37;12
9;8
37;35
78;96
15;113
15;142
45;82
22;95
81;100
26;70
77;246
22;147
02;99
22;128
26;291
72;256
9;339
02;63
64;165
28;132
84;390
9;4;27
5;27
4;12
6;37
6;41;27
9;48
4;21
9;22
2;21;27
4;12
1;69;27
8;34
8;19
2;26
3;75
9;64
7;83
1;10
9;31
5;25
7;94
3;1
3;1
5;27
8;27
3;1
9;27
3;27
2;27
3;1
9;27
3;1
6;27
2;27
5;27
4;27
6;27
4;27
4;27
4;27
5;27
3;27
51;3
45;2
16;2
17;2
45;3
13;2
12;2
1;2
42;2
08;2
4;2
07;2
03;2
08;2
05;2
05;2
08;2
09;2
12;2
18;2
16;2
29;179;
66;187;
43;2
47;2
9;210;
44;2
44;4
44;3
06;200;
42;2
86;190;
41;1
36;3
42;7
36;6
36;6
36;4
38;6
36;3
4;3
37;4
8:10;0;86
8:20;0;83;4
8:30;0;82
8:40;0;77
8:50;0;73
9:00;0;74
9:10;0;72
9:20;0;74
9:30;0;76
9:40;0;74
9:50;0;73
10:00;0;72
10:10;0;72
10:20;0;69
10:30;0;69
10:40;0;67
10:50;0;67
11:00;0;70
11:10;0;71
11:20;0;68
11:30;0;68
64;3
16;93
99;4
77;5
95;6
4;6
88;6
43;6
73;6
1;6
35;7
86;8
79;7
31;7
93;8
24;8
75;8
91;7
28;7
88;7
87;8
37;2
3;2
3;1
3;1
35;2
3;1
3;1
3;1
59;2
3;1
72;2
3;1
3;1
3;2
3;2
3;2
3;2
3;2
3;2
3;2
3;2
25;2
19;2
28;2
32;2
15;2
39;2
47;2
54;2
09;2
66;2
07;2
86;2
99;2
17;2
19;2
29;2
37;2
4;2
39;2
55;2
59;2
Tep.půdy 3
Tep.půdy 2
Tep.půdy 1
Vlhkost půdy
GR odraz
GR dopad
Teplota 30cm
Vlhkost 30cm
Teplota 2m
Vlhkost 2m
Srážky
čas měření
dd.mm.rrrr hh:mm;s
mm
°C
12;177;
47;180;
72;200;
48;200;
06;200;
38;190;
62;200;
84;240;
63;240;
88;260;
06;240;
72;200;
26;210;
43;220;
94;230;
34;230;
Data Mining
• Knowledge discovery in data bases is “the
nontrivial process of identifying valid, novel,
potentially useful, and ultimately
understandable patterns in data” Fayyad
(1996).
Used Data Mining Methods
• Conceptual Lattice
• Rough Sets
Conceptual Lattice
• Data Mining Context
• C = (O, I, R)
– O is a set of an objects x
– I is a set of an items (attributes) y
– R is a binary relation R  O I
Conceptual Lattice
• Conceptual Lattice L
• Derived from Data Mining Context C
• X = xOyY, x R y 
• Y = yIxX, x R y 
– X is the largest set of an objects X  O
– Y is the largest set of an items Y  I
Conceptual Lattice
• Hasse diagram
• The Hasse diagram is constructed by use the
partial arrangement "<„.
– Edge from the H1 to H2 exist if H1 < H2 and none
of element of H3 fulfil condition H1 < H3 < H2.
– H1 is an antecedent of element H2 (H2 is the
descendant of the element H1).
– A pair of X, Y represents a node in Hasse
diagram.
Transformace datab8ze
• Hasse diagram
• The Hasse diagram is constructed by use the
partial arrangement "<„.
– Edge from the H1 to H2 exist if H1 < H2 and none
of element of H3 fulfil condition H1 < H3 < H2.
– H1 is an antecedent of element H2 (H2 is the
descendant of the element H1).
– A pair of X, Y represents a node in Hasse
diagram.
Conceptual Lattice - Example
• C = (A0, A1, A2, A3, A4,3, 4, 7, 8, 9, R)
• Where:
–
–
–
–
C … context of data mining
A0, A1, A2, A3, A4 … Monitoring Classes
3, 4, 7, 8, 9 … Situations
R … relation which is represented in the table
MG
Monitoring Classes
Situations
Conceptual Lattice
Table MG which represents relation R.
MG
3
4
7
8
9
A0
1
1
1
1
1
A1
1
1
1
1
1
1
1
A2
1
A3
1
A4
1
1
1
1
(3,4,5,7,8,9), A0
(3,7,8), A0 A3
(4), A0 A1 A2
(3,4,7,9), A0 A4
(3), A0 A1 A3
(3,7), A0 A3 A4
((4,7,9), A0 A2 A4
(7,8), A0 A2 A3
(3,4), A0 A1 A4
(7), A0 A2 A3 A4
(3), A0 A1 A3 A4
(4) A0 A1 A2 A4
(0), A0 A1 A2 A3 A4
Hass diagram
(4,7,8,9), A0 A2
Conceptual Lattice
(3,4), A0 A1
Conceptual Lattice
• Guarantee of the rule’s reliability and validity.
• Support
– supp(Ai, S) = ((s  S  Ict(s, Ai))/ ((S ))
– Supp (Ai  Aj, S) = supp(Ai  Aj, S )
• Confidence
– Conf (Ai  Aj, S) = Supp (Ai  Aj, S) / supp(Ai)
Rule No. i
1
2
3
4
5
6
7
8
9
10
11
Rule ri
A1  A2
A1  A3
A1  A4
A2  A3
A3  A4
A1 A2  A4
A2 A4  A4
A2 A3  A4
A2 A4  A3
A1 A3  A4
A3 A4  A1
Supp(ri)
0.2
0.2
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.2
Conf (ri)
0.5
0.5
1
0.5
0.66
1
0.33
0.5
0.33
1
0.5
Rough Sets
• Relation of indiscernibility
• x1, x2  U,
• (x1 RE(A) x2 )) ⇔ (g(x1, ai) = g(x2, ai))
• Where:
–
–
–
–
U … universe of elements.
A … set of attributes
Vai … sets of values
g: U x A → V
Rough Sets
• Which of these elements of universe U and
with what certainty approach subset of X ⊂
U, in that we are interested ?
• Lower Approximation
• Upper Approximation
• Border set
Rough Sets
• Lower Approximation
• The Lower Approximation (positive area
PosiRE(X) ) is a set of objects which
certain belong to a subset.
• PosiRE(X) = ∪ { Y Ⅰ (Y ∈ (U/RE)) AND
(Y ⊆ X)
Rough Sets
• Upper Approximation
• The set of elements from the U, which
may (possibly) belongs to X.
• PossRE(X) = ∪ {Y Ⅰ (Y ∈ U/RE) AND (Y
∩ X ≠ ∅) }
Rough Sets
• Boundary region
• Difference between the upper and lower
approximation X.
• BoundRE(X) = PossRE(X) - PosiRE(X)
Rough Sets
• Rough Set
• Rough set is a subset X of universe U and this
subset is defined using the upper and lower
approximation (PossRE(X), PosiRE (X)) and for
which:
•
BoundRE(X)  ∅
Rough Sets
• Rough accuracy of aproximation.
• RE(X) = card (PosiRE(X)) / card
(PossRE(X))
Conclusion
• The paper proposed application of two data
mining methods. Fragments of a monitoring
system database have been used for the data
support. The paper emphasises that the use
of the original database content is not direct
and it is necessary to transform it into forms
utilisable by the selected data mining
methods. The success of
data mining
process then strongly depends also on the
definition of the monitoring classes and the
“operation" situations (formulated by experts).
Download