Department of Instrumentation and Control Engineering, Faculty of Mechanical Engineering, CTU in Prague, Technická 4, 166 07 Prague 6 The Application of Data Mining Methods In Monitoring of Ecosystems Jiri BILA and Jakub JURA Jiri.Bila@fs.cvut.cz Jakub.Jura@fs.cvut.cz Monitoring of Ecosystems • 11 Measuring Stations • 13 variables • Sampling period 6 minutes Database system for monitoring Měřicí stanice Domanín % °C % °C W/m2 W/m2 % °C °C 2.12.2007 2.12.2007 2.12.2007 2.12.2007 2.12.2007 2.12.2007 2.12.2007 2.12.2007 2.12.2007 2.12.2007 2.12.2007 2.12.2007 2.12.2007 2.12.2007 2.12.2007 2.12.2007 2.12.2007 2.12.2007 2.12.2007 2.12.2007 2.12.2007 25;97 2;2 14;87 36;69 26;63 58;65 64;71 4;77 35;77 64;78 96;65 11;67 75;72 8;72 14;67 36;65 5;66 44;75 46;75 91;68 46;70 28;0 13;20 42;3 57;7 33;9 94;9 3;7 81;6 94;6 8;6 78;10 03;10 31;9 7;8 83;10 16;10 25;10 82;7 37;7 22;9 71;9 37;12 9;8 37;35 78;96 15;113 15;142 45;82 22;95 81;100 26;70 77;246 22;147 02;99 22;128 26;291 72;256 9;339 02;63 64;165 28;132 84;390 9;4;27 5;27 4;12 6;37 6;41;27 9;48 4;21 9;22 2;21;27 4;12 1;69;27 8;34 8;19 2;26 3;75 9;64 7;83 1;10 9;31 5;25 7;94 3;1 3;1 5;27 8;27 3;1 9;27 3;27 2;27 3;1 9;27 3;1 6;27 2;27 5;27 4;27 6;27 4;27 4;27 4;27 5;27 3;27 51;3 45;2 16;2 17;2 45;3 13;2 12;2 1;2 42;2 08;2 4;2 07;2 03;2 08;2 05;2 05;2 08;2 09;2 12;2 18;2 16;2 29;179; 66;187; 43;2 47;2 9;210; 44;2 44;4 44;3 06;200; 42;2 86;190; 41;1 36;3 42;7 36;6 36;6 36;4 38;6 36;3 4;3 37;4 8:10;0;86 8:20;0;83;4 8:30;0;82 8:40;0;77 8:50;0;73 9:00;0;74 9:10;0;72 9:20;0;74 9:30;0;76 9:40;0;74 9:50;0;73 10:00;0;72 10:10;0;72 10:20;0;69 10:30;0;69 10:40;0;67 10:50;0;67 11:00;0;70 11:10;0;71 11:20;0;68 11:30;0;68 64;3 16;93 99;4 77;5 95;6 4;6 88;6 43;6 73;6 1;6 35;7 86;8 79;7 31;7 93;8 24;8 75;8 91;7 28;7 88;7 87;8 37;2 3;2 3;1 3;1 35;2 3;1 3;1 3;1 59;2 3;1 72;2 3;1 3;1 3;2 3;2 3;2 3;2 3;2 3;2 3;2 3;2 25;2 19;2 28;2 32;2 15;2 39;2 47;2 54;2 09;2 66;2 07;2 86;2 99;2 17;2 19;2 29;2 37;2 4;2 39;2 55;2 59;2 Tep.půdy 3 Tep.půdy 2 Tep.půdy 1 Vlhkost půdy GR odraz GR dopad Teplota 30cm Vlhkost 30cm Teplota 2m Vlhkost 2m Srážky čas měření dd.mm.rrrr hh:mm;s mm °C 12;177; 47;180; 72;200; 48;200; 06;200; 38;190; 62;200; 84;240; 63;240; 88;260; 06;240; 72;200; 26;210; 43;220; 94;230; 34;230; Data Mining • Knowledge discovery in data bases is “the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data” Fayyad (1996). Used Data Mining Methods • Conceptual Lattice • Rough Sets Conceptual Lattice • Data Mining Context • C = (O, I, R) – O is a set of an objects x – I is a set of an items (attributes) y – R is a binary relation R O I Conceptual Lattice • Conceptual Lattice L • Derived from Data Mining Context C • X = xOyY, x R y • Y = yIxX, x R y – X is the largest set of an objects X O – Y is the largest set of an items Y I Conceptual Lattice • Hasse diagram • The Hasse diagram is constructed by use the partial arrangement "<„. – Edge from the H1 to H2 exist if H1 < H2 and none of element of H3 fulfil condition H1 < H3 < H2. – H1 is an antecedent of element H2 (H2 is the descendant of the element H1). – A pair of X, Y represents a node in Hasse diagram. Transformace datab8ze • Hasse diagram • The Hasse diagram is constructed by use the partial arrangement "<„. – Edge from the H1 to H2 exist if H1 < H2 and none of element of H3 fulfil condition H1 < H3 < H2. – H1 is an antecedent of element H2 (H2 is the descendant of the element H1). – A pair of X, Y represents a node in Hasse diagram. Conceptual Lattice - Example • C = (A0, A1, A2, A3, A4,3, 4, 7, 8, 9, R) • Where: – – – – C … context of data mining A0, A1, A2, A3, A4 … Monitoring Classes 3, 4, 7, 8, 9 … Situations R … relation which is represented in the table MG Monitoring Classes Situations Conceptual Lattice Table MG which represents relation R. MG 3 4 7 8 9 A0 1 1 1 1 1 A1 1 1 1 1 1 1 1 A2 1 A3 1 A4 1 1 1 1 (3,4,5,7,8,9), A0 (3,7,8), A0 A3 (4), A0 A1 A2 (3,4,7,9), A0 A4 (3), A0 A1 A3 (3,7), A0 A3 A4 ((4,7,9), A0 A2 A4 (7,8), A0 A2 A3 (3,4), A0 A1 A4 (7), A0 A2 A3 A4 (3), A0 A1 A3 A4 (4) A0 A1 A2 A4 (0), A0 A1 A2 A3 A4 Hass diagram (4,7,8,9), A0 A2 Conceptual Lattice (3,4), A0 A1 Conceptual Lattice • Guarantee of the rule’s reliability and validity. • Support – supp(Ai, S) = ((s S Ict(s, Ai))/ ((S )) – Supp (Ai Aj, S) = supp(Ai Aj, S ) • Confidence – Conf (Ai Aj, S) = Supp (Ai Aj, S) / supp(Ai) Rule No. i 1 2 3 4 5 6 7 8 9 10 11 Rule ri A1 A2 A1 A3 A1 A4 A2 A3 A3 A4 A1 A2 A4 A2 A4 A4 A2 A3 A4 A2 A4 A3 A1 A3 A4 A3 A4 A1 Supp(ri) 0.2 0.2 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.2 Conf (ri) 0.5 0.5 1 0.5 0.66 1 0.33 0.5 0.33 1 0.5 Rough Sets • Relation of indiscernibility • x1, x2 U, • (x1 RE(A) x2 )) ⇔ (g(x1, ai) = g(x2, ai)) • Where: – – – – U … universe of elements. A … set of attributes Vai … sets of values g: U x A → V Rough Sets • Which of these elements of universe U and with what certainty approach subset of X ⊂ U, in that we are interested ? • Lower Approximation • Upper Approximation • Border set Rough Sets • Lower Approximation • The Lower Approximation (positive area PosiRE(X) ) is a set of objects which certain belong to a subset. • PosiRE(X) = ∪ { Y Ⅰ (Y ∈ (U/RE)) AND (Y ⊆ X) Rough Sets • Upper Approximation • The set of elements from the U, which may (possibly) belongs to X. • PossRE(X) = ∪ {Y Ⅰ (Y ∈ U/RE) AND (Y ∩ X ≠ ∅) } Rough Sets • Boundary region • Difference between the upper and lower approximation X. • BoundRE(X) = PossRE(X) - PosiRE(X) Rough Sets • Rough Set • Rough set is a subset X of universe U and this subset is defined using the upper and lower approximation (PossRE(X), PosiRE (X)) and for which: • BoundRE(X) ∅ Rough Sets • Rough accuracy of aproximation. • RE(X) = card (PosiRE(X)) / card (PossRE(X)) Conclusion • The paper proposed application of two data mining methods. Fragments of a monitoring system database have been used for the data support. The paper emphasises that the use of the original database content is not direct and it is necessary to transform it into forms utilisable by the selected data mining methods. The success of data mining process then strongly depends also on the definition of the monitoring classes and the “operation" situations (formulated by experts).