Handwriting Digit Recognition 1

advertisement
Handwriting Digit Recognition
1
Introduction
Data Preprocessing
Feature Extraction
Feature Selection
Classification
Comparison and Conclusion
Suggestions
2
Introduction
Record
classifica
tion
Detection
3
Database
• MINST data base!
4
Preprocessing and Database
5
Methods
Segmentation
Feature Extraction
Feature Selection
Classification
• 28 by 28 segments
• DWT Coefficients ; Daubiches, Symlet, Coiflet,
Haar, biorthogonal
• PCA, MSPCA, KS Statistical tests
• KNN
• SVM, NN, RBF, Linear C, Boosted Stumps, NonL
6
Feature Extraction
7
Feature Extraction
8
Feature Extraction
9
Feature Extraction
10
Feature Extraction
11
Feature Extraction
12
Feature Extraction
13
Feature Extraction
Harr
Db 20
Bior1.5
Sym5
14
Feature Selection
Different methods
Feature
selection
Statistical
Tests
Lilieforce
PCA
2 sided
KS
15
Feature Selection
 probably
the most widely-used and well-known
of the “standard” feature selection methods
 invented by Pearson (1901) and
Hotelling (1933)
16
Feature Selection
  takes a data matrix of n objects by p variables, which may be
correlated, and summarizes it by uncorrelated axes (principal
components or principal axes) that are linear combinations of the
original p variables
the first k components display as much as possible of the variation
among objects.
17
Feature Selection
18
Feature Selection
19
Classification
Digits: 6, 7 and 9
20
Classification
Digits 1,4 and 0
21
Classification
Digits 3, 8 and 1
22
Classification
Digits 7 , 3 and 9
23
Classification
Digits 0, 1 and 6
24
Result and conclusion
Wavelet Name
# PCA Comp’s
used as feature
Bior 1.5
Haar
Db20
Sym5
db2
db8
Sym8
Coif2
Bior 3.5
20
20
20
20
20
20
20
20
20
Train
samples
#
500
500
500
500
500
500
500
500
500
Test
Results (Error
samples
%)
#
200
13.33
200
26.66
200
18.33
200
10
200
20
200
26.66
200
18.33
200
21.66
200
30
25
Result and conclusion
Number
Feature Name
Clssiffier
Results (Error %)
1
-
linear classifier (1layer NN)
12
2
shape context feature
extraction
K-nearest-neighbors
5
3
-
boosted trees (17
leaves)
7.7
4
Haar features
product of stumps on
Haar f.
1.02
5
-
1000 RBF + linear
classifier
3.6
6
deg-9 poly, 2-pixel
jittered
Virtual SVM,
0.56
7
300 hidden units, mean
square error
2-layer NN,
4.7
8
cross-entropy [affine
distortions]
2-layer NN
1.1
9
-
6-layer NN
0.35
10
unsupervised sparse
features
SVM
0.59
11
Sym 5 coefficients
KNN
10
26
Suggestions
1. create an wavelet basis based on mean shape of
digits (using Chapa and Rao method).
• 2. cancelling rotations in digits.
• 3. Using more powerful statistical tests to find best
coefficients (features).
• 4. Using better classifiers such as SVM or HMM.
• 5. Find some features rather than coefficients in each
representation of image in different Approximation or
Detail level. Such as high order statistic moments of
histogram etc.
• 27
28
Download