Slide 1

advertisement
Towards a Minimum Description
Length Based Stopping Criterion for
Semi-Supervised Time Series
Classification
Nurjahan Begum, Bing Hu,
Thanawin Rakthanmanon, and Eamonn Keogh
Outline

Introduction


Proposed Stopping Criterion




Motivation of Stopping Criterion for Semi-Supervised
Classification
Minimum Description Length (MDL) technique
Our Approach
Experimental Results
Conclusion
2
Introduction
We have developed a Minimum Description Length
based Stopping Criterion for Semi-supervised Time
Series Classification

Why Semi-Supervised Learning?

Why do we need a Stopping Criterion?
3
Why Semi-Supervised Learning?

Labeled data
 Scarce and extremely expensive*
 Human intervention

Unlabeled data
 Abundant.
 PhysioBank archive* has more than 700 GB of digitized
signals and time series freely available.

Semi Supervised classification
 Less labeled data
 Less human effort and usually obtains higher accuracy*
*F. Florea, et. al., Medical image categorization with MedIC and MedGIFT (2006)
* A. L. Goldberger, et. al. PhysioBank, PhysioToolkit, and PhysioNet: Components of a
New Research Resource for Complex Physiologic Signals (2000)
* L. Wei et. al., Semi-Supervised Time Series Classification (2006)
4
Why do we need a Stopping Criterion?
Cardiac
Tamponade
Patient
Normal Patient
Use Semi-Supervised
Classification
5
Why do we need a Stopping Criterion?
Cardiac
Tamponade
Patient
Normal Patient
Use Semi-Supervised
Classification
6
Why do we need a Stopping Criterion?
Cardiac
Tamponade
Patient
Normal Patient
Use Semi-Supervised
Classification
7
Why do we need a Stopping Criterion?
Cardiac
Tamponade
Patient
Normal Patient
Oops…
We are adding false
positives!
8
Our Contribution

A novel, parameter free stopping criterion
using Minimum Description Length
(MDL) for semi-supervised time series
classification

Allows easy adaptation by experts in
medical community
9
Minimum Description Length (MDL)

MDL is a formalization of Occam's Razor

The best hypothesis for a given set of data is the one
that leads to the best compression of the data.
10
Minimum Description Length (MDL)

MDL is a formalization of Occam's Razor


The best hypothesis for a given set of data is the one
that leads to the best compression of the data.
Why MDL?

Intrinsically parameter

Leverages the true underlying structure of data

Avoids needing to explain all of the data
Has recently shown great potential for real-valued
time series data

free
11
Our Approach
Given Positive Instance
Original Time Series
12
Our Approach


Discretize the Time Series
Repeat
Find the Nearest Neighbor of the Positive Instance set
 Calculate the BitCount
Until BitCount increases

Given Positive Instance
Original Time Series
13
Discrete Normalization (Why?)

MDL is defined in discrete space

Time series are real-valued

Need to normalize real-valued data in a space
of reduced cardinality
Won’t drastic information reduction
loose meaningful information?
14
Will Discrete Normalization
loose meaningful information?



The answer is NO!
Justification?
A time series clustering experiment*…
(REF: [1][2])
Real valued time series
Discretized time series
(cardinality = 16)
[1] B. Hu, et. al. Discovering the Intrinsic Cardinality and Dimensionality of Time Series using MDL.(2011)
[2] T. Rakthanmanon, et. al. Time Series Epenthesis: Clustering Time Series Streams Requires Ignoring Some Data (2011)
*Incartdb dataset (Record I70, Signal II) [www.physionet.org]
15
Our Approach


Discretize the Time Series
Repeat
Find the Nearest Neighbor of the Positive Instance set
 Calculate the BitCount
Until BitCount increases

Given Positive Instance
Original Time Series
16
Our Approach
H=
Iteration 0
17
Our Approach
H=
Iteration 0
18
Our Approach
No. of instances encoded
H=
3500
Iteration 0
2500
0
2
4
6
= 2800
BitCount
Bit Count = 100 * log216 + 6 * 100 * log216
19
Our Approach
No. of instances encoded
3500
Iteration 0
2500
Iteration 1
BitCount
H=
0
2
4
6
0
2
4
6
3500
2500
Bit Count
=100 * log216 + 6 * (ceil(log2100)+log216) + 5 * 100 * log216
= 2466
20
Our Approach
No. of instances encoded
3500
Iteration 0
2500
Iteration 1
BitCount
H=
4
6
0
2
4
6
2
4
6
2500
3500
Bit Count
2500
= 2242
2
3500
Iteration 2
= 100 * log216 + 22 * (ceil(log2100)+log216) + 4 * 100 * log216
0
0
21
Our Approach
No. of instances encoded
H=
3500
Iteration 3
2500
0
= 100 * log216 + 37 * (ceil(log2100)+log216) + 3 * 100 * log216
= 2007
4
6
BitCount
Bit Count
2
22
Our Approach
No. of instances encoded
H=
3500
Iteration 4
Bit Count
2500
BitCount
Iteration 3
0
2
4
6
0
2
4
6
3500
2500
= 100 * log216 + 115 *(ceil(log2100)+log216) + 2 * 100 * log216
= 2465
23
Our Approach
No. of instances encoded
H=
3500
Iteration 4
2500
BitCount
Iteration 3
0
2
4
6
0
2
4
6
3500
2500
3500
Iteration 5
Stopping point
2500
Bit Count
0
2
4
6
= 100 * log216 + 192 * (ceil(log2100)+log216) + 1*100 * log216
= 2912
24
Experimental Results
Dataset
Recordings
Record
Signal
# of target
class
instances
MIT-BIH
Supraventricular
Arrhythmia
Database (svdb)
78 (each ½
hour long)
801
ECG1
268
(Premature
Ventricular
contractions)
St. Petersburg
Institute of
Cardiological
Technics 12-lead
Arrhythmia
Database
(incartdb)
75 (each ½
hour long)
I70
Signal II 126 (Atrial
Premature
Beat)
Sudden Cardiac
Death Holter
Database (sddb)
23
52 (7 ½
hour
long)*
ECG1
216 (R-on-T
Premature
Ventricular
Contraction)
25
* We worked with ~1 hour long data
Interpreting the plots
Ideal
BitCount
BitCount
Bad, adds false positives
Stopping Point
Stopping Point
Number of instances encoded
Number of instances encoded
Really bad
BitCount
BitCount
Bad, misses true positives
Stopping Point
Number of instances encoded
Stopping Point
Number of instances encoded
Experimental Results
2.85 X 10
svdb
BitCount
3.2
2.8
2.4
100
300
incartdb
2.75
2.65
2.55
Stopping Point
Stopping Point
500
700
100
Number of instances encoded
300
500
700
Number of instances encoded
2.6x10 5
sddb
BitCount
BitCount
3.6 X 10
5
5
2.3
2
1.7
Stopping Point
100
200
300
400
Number of instances encoded
27
Experimental Results (Contd.)
6.4 X 10 5
2X 10 5
BitCount
6.1
Stopping Point
Fish_test
1.9
Stopping Point
1.8
5.8
1.7
5.5
0
100
200
Number of instances encoded
5
1.9 X 10
300
0
20
40
60
80
Number of instances encoded
100
FaceAll_test
1.8
BitCount
BitCount
Swedish_leaf
1.7
1.6
Stopping Point
1.5
0
100
200
Number of instances encoded
300
28
Comparison with the state-of-the-art
algorithm
Fish_test
Minimal Distance
1.2
0.6
Too Early Stopping (Li et. al’s method*)
0.4
0
5
0
10
BitCount
2X 10
20
30
40
50
60
Number of instances classified
70
80
1.9
1.8
Stopping Point
1.7
0
10
20
30
40
50
Number of instances encoded
60
70
80
29
* L. Wei et. al., Semi-Supervised Time Series Classification (2006)
Conclusions

Novel way of semi-supervised classification with
only one labeled instance.
Previous approaches of stopping the semi-supervised
classification required –
extensive parameter tuning,
remained something of a black art.
Stopping criterion for semi-supervised classification based
on MDL.
To our knowledge, our stopping criterion is the
 first parameter free criterion that mitigates the early
stopping problem,
leverages the inherent structure of the data.
30
Thank you!
If you have any question, please contact me:
Name: Nurjahan Begum
Email: nbegu001@ucr.edu
31
Download