Document

advertisement
Cognitive
data analysis
Nikolay Zagoruiko
Institute of Mathematics of the Siberian Devision
of the Russian Academy of Sciences,
Pr. Koptyg 4, 630090 Novosibirsk, Russia,
zag@math.nsc.ru
Area of interests
Data Analysis, Pattern Recognition, Empirical Prediction,
Discovering of Regularities, Data Mining, Machine Learning,
Knowledge Discovering, Intelligence Data Analysis
Cognitive Calculations
Human-centered approach:
The person - object of studying its cognitive mechanisms
The decision of new strategic tasks is impossible without the
accelerated increase of an intellectual level of means of
supervision, the analysis and management.
The person - the subject using results of the analysis
Complexity of functioning of these means and character of
received results complicate understanding of results. In these
conditions the person, actually, is excluded from a man-machine
control system.
Specificity of DM tasks:
•
•
•
•
•
Great volumes of data
Polytypic attributes
Quantity of attributes >> numbers of objects
Presence of noise and blanks
Absence of the information on distributions and
dependences
Ontology of DM
Abundance of methods is result of
absence the uniform approach
to the decision of tasks of different type
That can learn at the person?
What deciding rules the person uses?
1967
Recognition
12
1
*
*
*
*
11
What deciding rules the person uses?
1967
Taxonomy
12
1
*
11
*
*
1. Person understands a results if classes are divided
by the perpendicular planes
y
y
y
Y’
X=0.8Y-3
a
x
X’
x
x
2. Person understands a results if classes are
described by standards
y
y
y
*
*
*
*
Y’
*
*
*
*
x
X’
x
Уникальная способность человека распознавать
трудно различимые образы основана на его умении
выбирать информативные признаки.
x
If at the solving of different classification tasks the person
passes from one basis to another?
Most likely, peoples use
some universal psycho-physiological function
Our hypothesis:
Basic function, used by the person at the
classification, recognition, feature selection etc.,
consists in measure of
similarity
Functions of Similarity
1) FS1 ( a, b)  1 
n
a
b 2

(
x

x
 i i i) ,
i 1
n
2) FS 2 ( a, b)  1    i | xia  xib |
i 1
3) FS3 ( a, b)  1  max | x  x |,
a
i
b
i
min( xia , xib )
4) FS 4 ( a, b)    i
,
a
b
max( xi , xi )
i 1
n
5) FS ( a, b)  1  e

n

i 1
( xia  xib ) 2
,....
Similarity is not absolute,
but a relative category
Is a object b close to a or it is distant?
a
b
Similarity is not absolute,
but a relative category
Is a object b close to a or it is distant?
a
b
a
b
c
Similarity is not absolute,
but a relative category
Is a object b close to a or it is distant?
a
b
a
b
a
b
c
We should know the answer on question:
In competition with what?
c
Function of Cоmpetitive (Rival) Similarity
(FRiS)
( r2  r1 )
F ( z,1 | 2) 
( r2  r1 )
B
r2
A
r1
z
+1
F
A
r1
z
B
r2
-1
Compact
ness
All pattern recognition methods are
based on hypothesis of compactness
Braverman E.M., 1962
The patterns are compact if
-the number of boundary points is not enough in comparison with their common number;
- compact patterns are separated from each other refer to not too elaborate borders.
B
B
A
A
B
B
A
A
Compact
ness
Similarity between objects of one pattern
should be maximal
Similarity between objects of different patterns
should be minimal
Compactness
Defensive capacity:
Compact patterns should satisfy
to condition of the
Maximal similarity
between objects
of the same pattern
b
B
r2
j
F ( j, i | b)  (r2  r1 ) / (r2  r1 )
b
r1
i
j
A
r1
r2
j
1
Di 
MA
MA
 F ( j , i | b)
j 1
r2
r1
b
Compactness
Tolerance:
Compact patterns should satisfy
to the condition
Maximal difference
of these objects with the
objects of other patterns
b
s
1
Ti 
M AM B
MA
MB
i 1
q 1
  F ( q, s | i )
B
q
r2
r2
j
F (q, s | i)  (r2  r1 ) / (r2  r1 )
r1
r1
i
A
Ci  ( Di  Ti ) / 2
1
CA 
MA
MA
C
i 1
i
1
CB 
MB
MB
C
q 1
q
C  C A * CB
Selection of the standards (stolps)
Algorithm FRiS-Stolp
max Ci  ( Di  Ti ) / 2
Value of FRiS for points on a plane
Criteria
Informativeness by Fisher
for normal distribution
IF
| 1   2 |

2
2
1   2
Compactness has the same sense and can be used as a
criteria of informativeness, which is invariant to
low of distribution and to relation of NM
Selection of feature
Initial set of features Xo
1, 2, 3,
…..…
…. j….
…..…
Engine
GRAD
Variant of subset X
<1,2,…,n>
Criteria
FRiS-compactness
Good
Bad
N
GRAD
Algorithm GRAD
It based on combination of two greedy algorithms:
forward and backward searches.
At a stage forward algorithm Addition is used
J.L. Barabash, 1963
LA   N  ( N  1)  ( N  2) 
n 1
 ( N  n  1)   ( N  j )
j 0
At a stage backward algorithm Deletion is used
Merill T. and Green O.M., 1963
LD   N  ( N  1)  ( N  2) 
 (n  1) 
N  n 1
 ( N  j)
j 0
GRAD
Algorithm AdDel
To easing influence of collecting errors a relaxation method it is applied.
n1
- number of most informative attributes, add-on to subsystem
(Add),
n2<n1 - number of less informative attributes, eliminated from subsystem (Del).
AdDel Relaxation method: n steps forward - n/2 steps back
Algorithm AdDel.
Reliability (R) of recognition at
different dimension space.
R(AdDel) > R(DelAd) > R(Ad) > R(Del)
GRAD
Algorithm GRAD
•
AdDel can work with groups of attributes (granules) of different capacity
m=1,2,3,…: , ,
,…
The granules can be formed by the exhaustive search method.
• But: Problem of combinatory explosion!
Decision: orientation on individual informativeness of attributes
f
It allows to granulate a most
informative part attributes only
L
Dependence of frequency f hits in an informative subsystem
from serial number L on individual informativeness
GRAD
Algorithm GRAD
(GRanulated AdDel)
1. Independent testing N attributes
Selection m1<<N first best
2
C
2. Forming m1 combinations
Selection m2<< Cm21 first best
3
C
3. Forming m1 combinations
3
Selection m3<< Cm1 first best
(m1 granules power 1)
(m2 granules power 2)
(m3 granules power 3)
M =<m1,m2,m3> - set of secondary attributes (granules)
AdDel selects m*<<|M| best granules, which included n*<<N attributes
X  x2 ,3 x6 ,5 x9 , x25 ,...
Criteria
Comparison of the criteria
(CV FRiS)
1,1
1
Fs
0,9
U
0,8
Fs
U
0,7
noise
0,6
0,05
Order of attributes by informativeness
0,1
0,15
0,2
0,25
N=100 M=2*100
mt =2*35 mC =2*65 +noise
....... .......
.......
.......
C = 0,661
C = 0,883
0,3
noise
Some real tasks
Task
K
M
N
Medicine:
Diagnostics of Diabetes II type
Diagnostics of Prostate Cancer
Recognition of type of Leukemia
Microarray data
9 genetic tables
3
4
2
2
2
Physics:
Complex analysis of spectra
7
20-400
1024
Commerse:
Forecasting of book sealing
(Data Mining Cup 2009)
-
4812
1862
43
5520
322
17153
38
7129
1000
500000
50-150 2000-12000
Recognition of two types of Leukemia
- ALL and AML
Training set
Control set
ALL
38 27
34 20
AML
11
14
N = 7129
I. Guyon, J. Weston, S. Barnhill, V. Vapnik
Gene Selection for Cancer Classification using
Support Vector Machines.
Machine Learning. 2002, 46 1-3: pp. 389-422.
Pentium T=15 sec
Pentium T=3 hours
В 27 первых
подпространствах P =34/34
Training set 38
N g Vsuc Vext Vmed
7129 0,95 0,01 0,42
4096 0,82 -0,67 0,30
2048 0,97 0,00 0,51
1024 1,00 0,41 0,66
512 0,97 0,20 0,79
256 1,00 0,59 0,79
128 1,00 0,56 0,80
64 1,00 0,45 0,76
32 1,00 0,45 0,65
16
1,00 0,25 0,66
8
1,00 0,21 0,66
4
0,97 0,01 0,49
2
0,97 -0,02 0,42
1
0,92 -0,19 0,45
Test set 34
Tsuc Text Tmed
P
0,85 -0,05 0,42 29
0,71 -0,77 0,34 24
0,85 -0,21 0,41 29
0,94 -0,02 0,47 32
0,88 0,01 0,51 30
0,94 0,07 0,62 32
0,97 -0,03 0,46 33
0,94 0,11 0,51 32
0,97 0,00 0,39 33
1,00 0,03 0,38 34
1,00 0,05 0,49 34
0,91 -0,08 0,45 31
0,88 -0,23 0,44 30
0,79 -0,27 0,23 27
I.Guyon, J.Weston, S.Barnhill, V.Vapnik
FRE
FRiS
0,72656
0,71373
0,71208
0,71077
0,70993
0,70973
0,70711
0,70574
0,70532
0,70243
Decision Rules
537/1 , 1833/1 , 2641/2 , 4049/2
1454/1 , 2641/1 , 4049/1
2641/1 , 3264/1 , 4049/1
435/1 , 2641/2 , 4049/2 , 6800/1
2266/1 , 2641/2 , 4049/2
2266/1 , 2641/2 , 2724/1 , 4049/2
2266/1 , 2641/2 , 3264/1 , 4049/2
2641/2 , 3264/1 , 4049/2 , 4446/1
435/1 , 2641/2 , 2895/1 , 4049/2
2641/2 , 2724/1 , 3862/1 , 4049/2
P
34
34
34
34
34
34
34
34
34
34
Name of gene Weight
2641/1 , 4049/1
2641/1
33
32
Zagoruiko N., Borisova I., Dyubanov V., Kutnenko O.
Best features
SVM
FRiS
803,4846
30(88%)
33(97%)
27(79%)
30(88%)
4846
Projection a training set on 2641 и 4049 features
AM
L
ALL
Comparison with 10 methods
• Jeffery I.,Higgins D.,Culhane A. Comparison and
evaluation of methods for generating differentially
expressed gene lists from microarray data. //
• http://www.biomedcentral.com/1471-2105/7/359
9 tasks on microarray data. 10 methods the feature selection.
Independent attributes. Selection of n first (best).
Criteria – min of errors on CV: 10 time by 50%.
Decision rules:
Support Vector Machine (SVM),
Between Group Analysis (BGA),
Naive Bayes Classification (NBC), K-Nearest Neighbors (KNN).
Methods of selection
Methods
Results
Significance analysis of microarrays (SAM)
42
Analysis of variance (ANOVA)
43
Empirical Bayes t-statistic
32
Template matching
38
maxT
37
Between group analysis (BGA)
43
Area under the receiver operating characteristic curve (ROC) 37
Welch t-statistic
39
Fold change
47
Rank products
42
FRiS-GRAD
Empirical Bayes t-statistic – for middle set of objects
Area under a ROC curve – for small noise and large set
Rank products – for large noise and small set
12
Results of comperasing
•
•
•
•
•
•
•
•
•
•
Задача
ALL1
ALL2
ALL3
ALL4
Prostate
Myeloma
ALL/AML
DLBCL
Colon
N0
m1/m2
max of 4 GRAD
12625 95/33
100.0 100.0
12625 24/101
78.2 80.8
12625 65/35
59.1 73.8
12625 26/67
82.1 83.9
12625 50/53
90.2 93.1
12625 36/137
82.9 81.4
7129 47/25
95.9 100.0
7129 58/19
94.3 93.5
2000 22/40
88.6 89.5
average
85.7 88.4
Unsettled problems
•
•
•
•
•
•
•
•
•
•
Censoring of training set
Recognition with boundary
Stolp+corridor (FRiS+LDR)
Imputation
Associations
Unite of tasks of different types (UC+X)
Optimization of algorithms
Realization of program system (OTEX 2)
Applications (medicine, genetics,…)
…..
Conclusion
FRiS-function:
1.Provides effective measure of
similarity, informativeness and
compactness
2.Provides unification of methods
3.Provides high quality of decisions
Publications:
http://math.nsc.ru/~wwwzag
Thank you!
• Questions, please?
Stolp
Decision rules
Choosing a standards (stolps)
 The stolp is an object which
protects own objects
and does not attack another's objects
Defensive capacity:
Similarity of the objects to a stolp should be maximal
a minimum of the miss of the targets,
Tolerance:
Similarity of the objects to another's objects - minimally
a minimum of false alarms
Stolp
Algorithm FRiS-Stolp
Compact patterns should satisfy
to two conditions:
Defencive capacity:
Maximal similarity
of objects on stolp i
F(j,i)|b=(R2-R1)/(R2+R1)
b
s
B
q
R2
1 MA
DCi 
F ( j, i) | b

M A j 1
R1
R2
j
R1
i
A
Tolerance:
Maximal difference
of other’s objects with stolp i
1
Ti 
MB
MB
 F ( q, s ) | i
q 1
1
Si  ( DCi  Ti )
2
Stolp
Algorithm FRiS-Stolp
F(j,i)|b=(R2-R1)/(R2+R1)
Security: Maximal similarity
of objects on stolp i
b
s
DCi 
1
F ( j, i) | b

M A j 1
B
q
R2
MA
R1
R2
j
R1
i
A
Tolerance: Maximal difference
of other’s objects with stolp i
1
Ti 
MB
MB
 F ( q, s ) | i
q 1
1
Si  ( DCi  Ti )
2
Decision rules
Алгоритм FRiS-Stolp
Примеры таксономии алгоритмом FRiS-Class
Примеры таксономии алгоритмом FRiS-Class
Сравнение FRiS-Class
с другими алгоритмами таксономии
0,9
0,8
0,7
FRiS-Cluster
Kmeans
0,6
Forel
Scat
0,5
FRiS-Tax
0,4
K
0,3
2
3
4
5
6
7
8
9
10 11 12 13 14 15
Download