Pattern Teams

advertisement
Building Global Models from Local Patterns
A.J. Knobbe
Feature-continuum
attributes
(constructed) features
patterns
classifiers
target concept
Two-phased process
Pattern Discovery
phase







frequent patterns
correlated patterns
interesting subgroups
decision boundaries
…
Pattern Combination
phase




redundancy reduction
dependency modeling
global model building
…
 Pattern Teams
 pattern networks
 global predictive
models
 …
Break discovery up into two phases
Transform complex problem into more simple one
Task: Subgroup Discovery
Subgroup Discovery:
Find subgroups that show substantially
different distribution of target concept.




top-down search for patterns
inductive constraints (sometimes monotonic)
evaluation measures: novelty, X2, information gain
also known as rule discovery, correlated pattern mining
Novelty




Also known as weighted relative accuracy
Balance between coverage and unexpectedness
nov(S,T) = p(ST) – p(S)p(T)
between −.25 and .25, 0 means uninteresting
target
T
F
T
.42
.13
F
.12
.33
subgroup
.54
.55
1.0
nov(ST) = p(ST)−p(S)p(T)
= .42 − .297 = .123
Demo Subgroup Discovery
redundancy exists in set of
local patterns
Demo Subgroup Discovery
500
450
400
350
300
250
200
150
100
50
0
1
335 669 1003 1337 1671 2005 2339 2673 3007 3341 3675 4009 4343 4677
Pattern Combination phase

Feature selection, redundancy reduction
–

Dependency modeling
–
–

Pattern Teams
Bayesian networks
Association rules
Global modeling
–
Classifiers, regression models
Pattern Teams & Pattern Networks
Pattern Teams
Pattern Discovery typically produces very many patterns
with high levels of redundancy 
Report small informative subset with specific properties



Promote dissimilarity of patterns reported
Additional value of individual patterns
Consider extent of patterns
–
Treat patterns as binary features/items
Intuitions






No two patterns should cover same set of examples
No pattern should cover complement of another
pattern
No pattern should cover logical combination of two or
more other patterns
Patterns should be mutually exclusive
The pattern set should lead to the best performing
classifier
Patterns should lie on convex hull in ROC-space
Quality measures for pattern sets
Judge pattern sets on the basis quality function





Joint Entropy (miki)
Exclusive Coverage
Wrapper accuracy
Area Under Curve in ROC-space
Bayesian Dirichlet equivalent uniform
unsupervised
supervised
Pattern Teams
9
9
8
8
7
7
6
6
5
5
4
4
3
3
2
2
1
1
0
0
-1
-4
-3,5
-3
-2,5
-2
-1,5
-1
-0,5
82 subgroups discovered
-1
0
-4
-3,5
-3
-2,5
-2
-1,5
-1
-0,5
0
4 subgroups in pattern team
Pattern Network


Again, treat patterns as binary features
Bayesian networks
–


conditional independence of patterns
Explain relationships between patterns
Explain role of patterns in Pattern Team
Demo Pattern Team & Network
redundancy removed to find truly divers patterns,
in this case using maximization of joint entropy
Demo Pattern Team & Network
pattern team, and related
patterns can be presented in
a bayesian network
peak around 39k
peak around 16k
peak around 89k
Properties of SD phase in PC
What knowledge about Subgroup Discovery
parameters can be exploited in Combination?

Interestingness
–
–


Are interesting subgroups diverse?
Are interesting subgroups correlated?
Information content
Support of patterns
joint entropy of 2 interesting subgroups
2.5
subgroups are relatively novel,
up to 2 bits of information
2
1.5
1
0.5
0
0
0.05
0.1
0.15
0.2
0.25
subgroups are very novel,
1 bit of information
correlation of interesting subgroups
novelty
subgroups are very novel,
and correlate
0.25
0.2
0.15
inter novelty
0.1
0.05
0
0
0.05
0.1
0.15
0.2
0.25
-0.05
-0.1
subgroups are novel,
but potentially independent
-0.15
-0.2
-0.25
novelty of subgroups
Building Classifiers from Local Patterns
Combination strategies
How to interpret a pattern set?
 Conjunctive (intersection of patterns)
 Disjunctive (union of patterns)
 Majority vote (equal weight linear separator)
 …
 Contingencies/Classifiers
Decision Table Majority (DTM)




Treat every truth-assignment as contingency
Classification based on conditional probability
Use majority class for empty contingencies
Only works with Pattern Team (else overfitting)
Support Vector Machine (SVM)
SVM with linear kernel
 Binary data
 All dimensions have same scale
 Works with large pattern sets
 Subgroup discovery has removed XOR-like
dependencies
 Interesting subgroups correlate
XOR-like dependencies
XOR-like dependencies
p1
p2
XOR-like dependencies
(0,1)
(1,1)
(0,0)
(1,0)
p1
p2
Division of labour between 2 phases

Subgroup Discovery Phase
–
–
–

Feature selection
Decision boundary finding/thresholding
Multivariate dependencies (XOR)
Pattern Combination Phase
–
–
–
Pattern selection
Combination (XOR?)
Class assignment
Combination-aware Subgroup
Discovery



Better global model
Superficially uninteresting patterns can be
reported
pruning of search space (new rule-measures)
subgroups are not novel,
team is optimal
Combination-aware Subgroup
Discovery
Subgroup Discovery ++:
Find a set of subgroups that show substantially
different distribution of target concept.
Considerations
–
–
–
support of pattern
diversity of pattern
…
Conclusions


Less hasty approach to model building
Interesting patterns serve two purposes
–
–



understandable knowledge
building blocks of global model
Pattern discovery without combination limited
Information exchange between phases
Integration of two phases non-trivial
Download