Experiments Result And Analysis

advertisement
Intelligent Control and Automation,
2008. WCICA 2008.
• NMF considers factorizations of the form:
𝑋 ≈ 𝑍𝐻
Where 𝑋 ∈ 𝑅+ 𝐹×𝐿 , 𝑍 ∈ 𝑅+ 𝐹×𝑀 , 𝐻 ∈ 𝑅+ 𝑀×𝐿 , 𝑀 β‰ͺ 𝐹
• To measure the cost of the decomposition, one popular
approach is to use the Kullback-Leibler (KL) divergence metric,
the cost for factorizing X into ZH is evaluated as:
𝐿
𝐷𝑁𝑀𝐹 (𝑋| 𝑍𝐻 =
𝐿
𝑗=1
𝐹
=
π‘₯𝑖,𝑗 ln
𝑗=1 𝑖=1
𝐾𝐿(π‘₯𝑗 | π‘β„Žπ‘—
π‘₯𝑖,𝑗
+
𝑧
β„Ž
π‘˜ 𝑖,π‘˜ π‘˜,𝑗
𝑧𝑖,π‘˜ β„Žπ‘˜,𝑗 − π‘₯𝑖,𝑗
π‘˜
• Using the Expectation Maximization (EM) algorithm and an
appropriately designed auxiliary function, it has been shown in
“Algorithms for non-negative matrix factorization,” the update
rule for the 𝑑-th iteration for β„Žπ‘˜,𝑗 (𝑑) is given by:
π‘₯𝑖,𝑗
(𝑑−1)
𝑖 𝑧𝑖,π‘˜
(𝑑−1) β„Ž (𝑑−1)
𝑧
𝑙,𝑗
𝑙 𝑖,𝑙
β„Žπ‘˜,𝑗 (𝑑) = β„Žπ‘˜,𝑗 (𝑑−1)
(𝑑−1)
𝑧
𝑖,π‘˜
𝑖
• while for 𝑧𝑖,π‘˜ (𝑑) the update rule is given by:
𝑧′
𝑖,π‘˜
(𝑑)
𝑗 β„Žπ‘˜,𝑗
= 𝑧𝑖,π‘˜ (𝑑−1)
(𝑑−1)
π‘₯𝑖,𝑗
(𝑑−1)
(𝑑−1)
π‘§β„Ž
β„Ž
𝑖,𝑙
𝑙,𝑗
𝑙
(𝑑−1)
β„Ž
𝑗 π‘˜,𝑗
• Finally, the basis images matrix 𝑍 is normalized so that its
column vectors elements sum up to one:
′ (𝑑)
𝑧
𝑖,π‘˜
𝑧𝑖,π‘˜ (𝑑) =
′ (𝑑)
𝑧
𝑙 𝑙,π‘˜
Cluster 1
…
Class 1
Cluster 𝐢1
Cluster 1
…
Cluster πΆπœƒ
Image
𝜌
…
…
…
Database
𝔗
…
…
Class π‘Ÿ
Image
1
Cluster πΆπ‘Ÿ
Image
𝑁(π‘Ÿ)(πœƒ)
Cluster 1
Class 𝑛
…
Cluster 𝐢𝑛
Image
1
Database
𝔗
Image
𝜌
Class π‘Ÿ
Image
𝑁(π‘Ÿ)(πœƒ)
Cluster πΆπœƒ
NMF
Dimensionality
reduction
Feature Vector
πœ‚πœŒ (π‘Ÿ)(πœƒ) = πœ‚πœŒ,1 (π‘Ÿ)(πœƒ) β‹― πœ‚πœŒ,𝑀 (π‘Ÿ)(πœƒ)
Mean of Feature Vector for the πœƒ-π‘‘β„Ž cluster of the π‘Ÿ-π‘‘β„Ž class:
πœ‡ (π‘Ÿ)(πœƒ) =
1
𝑁(π‘Ÿ)(πœƒ)
𝑁(π‘Ÿ)(πœƒ)
πœ‚π‘– (π‘Ÿ)(πœƒ) = πœ‡1 (π‘Ÿ)(πœƒ) β‹― πœ‡π‘€ (π‘Ÿ)(πœƒ)
𝑖=1
𝑇
𝑇
• Using the above notations we can define the within cluster
scatter matrix 𝑆𝑀 as:
𝑛
πΆπ‘Ÿ 𝑁(π‘Ÿ)(πœƒ)
𝑆𝑀 =
πœ‚πœŒ
(π‘Ÿ)(πœƒ)
− πœ‡ (π‘Ÿ)(πœƒ)
η𝜌
(π‘Ÿ)(πœƒ)
−
𝑇
(π‘Ÿ)(πœƒ)
πœ‡
π‘Ÿ=1 πœƒ=1 ρ=1
• and the between cluster scatter matrix 𝑆𝑏 as:
𝑛
𝑛
𝐢𝑖
πΆπ‘Ÿ
πœ‡ (𝑖)(𝑗)
𝑆𝑏 =
− πœ‡ (π‘Ÿ)(πœƒ)
πœ‡ (𝑖)(𝑗)
𝑖=1 π‘Ÿ≠1 𝑗=1 πœƒ=1
• Our Goal:
π‘‘π‘Ÿ 𝑆𝑀 ↓ π‘Žπ‘›π‘‘ π‘‘π‘Ÿ[𝑆𝑏 ] ↑
−
𝑇
(π‘Ÿ)(πœƒ)
πœ‡
Since we desire the trace of matrix 𝑆𝑀 to be as small as possible
and at the same time the trace of 𝑆𝑏 to be as large as possible,
the new cost function is formulated as:
𝛼
𝛽
𝐷𝑆𝐷𝑁𝑀𝐹 𝑋||𝑍𝐻 = 𝐷𝑁𝑀𝐹 𝑋||𝑍𝐻 + π‘‘π‘Ÿ 𝑆𝑀 − π‘‘π‘Ÿ 𝑆𝑏
2
2
1
where 𝛼 and 𝛽 are positive constants, while is used to simplify
2
subsequent derivations. Consequently, the new minimization
problem is formulated as:
min 𝐷𝑆𝐷𝑁𝑀𝐹 𝑋||𝑍𝐻
𝑍,𝐻
subject to :
𝑧𝑖,π‘˜ ≥ 0, β„Žπ‘˜,𝑗 ≥ 0,
𝑖 𝑍𝑖,π‘˜
= 1, ∀𝑖, 𝑗, π‘˜.
• The constrained optimization problem is solved by introducing
Lagrangian multipliers ∢
β„’
𝛼
𝛽
= 𝐷𝑁𝑀𝐹 𝑋||𝑍𝐻 + π‘‘π‘Ÿ 𝑆𝑀 − π‘‘π‘Ÿ 𝑆𝑏 +
∅𝑖,π‘˜ 𝑧𝑖,π‘˜
2
2
𝑖
π‘˜
+
𝑗
π‘˜
ѱ𝑗,π‘˜ β„Žπ‘—,π‘˜
𝛼
𝛽
β„’ = 𝐷𝑁𝑀𝐹 𝑋||𝑍𝐻 + π‘‘π‘Ÿ 𝑆𝑀 − π‘‘π‘Ÿ 𝑆𝑏 + π‘‘π‘Ÿ ∅𝑍 𝑇 + π‘‘π‘Ÿ ѱ𝐻 𝑇
2
2
• Consequently, the optimization problem is equivalent to the
minimization of the Lagrangian π‘Žπ‘Ÿπ‘” min β„’
𝑍,𝐻
• To minimize β„’, we first obtain its partial derivatives with respect to
𝑧𝑖,𝑗 and β„Žπ‘–,𝑗 and set them equal to zero :
πœ•β„’
=−
πœ•β„Žπ‘–,𝑗
πœ•β„’
=−
πœ•π‘§π‘–,𝑗
π‘˜
𝑙
π‘₯π‘˜,𝑗 π‘§π‘˜,𝑖
+
𝑧
β„Ž
𝑙 π‘˜,𝑙 𝑙,𝑗
π‘₯π‘˜,𝑗 π‘§π‘˜,𝑖
+
𝑧
β„Ž
𝑙 π‘˜,𝑙 𝑙,𝑗
𝑙
𝑙
𝛼 πœ•π‘‘π‘Ÿ 𝑆𝑀
𝛽 πœ•π‘‘π‘Ÿ 𝑆𝑏
𝑧𝑙,𝑖 + ѱ𝑖,𝑗 +
+
2 πœ•β„Žπ‘–,𝑗
2 πœ•π‘§π‘–,𝑗
𝛼 πœ•π‘‘π‘Ÿ 𝑆𝑀
𝛽 πœ•π‘‘π‘Ÿ 𝑆𝑏
𝑧𝑙,𝑖 + ѱ𝑖,𝑗 +
+
2 πœ•β„Žπ‘–,𝑗
2 πœ•π‘§π‘–,𝑗
• DNMF combines Fisher’s criterion in the NMF decomposition
• and achieves a more efficient decomposition of the
• provided data to its discriminant parts, thus enhancing
separability
• between classes compared with conventional NMF
𝑉 = 𝑣1 , 𝑣2 , … , π‘£π‘š ∈ 𝑅1×π‘š
↓ Dimensionality reduction
π‘ˆ = 𝑒1 , 𝑒2 , … , 𝑒𝑛 ∈ 𝑅1×𝑛
Where 𝑛 β‰ͺ π‘š
𝑉∗𝐡 =π‘ˆ
Where 𝐡 ∈ π‘…π‘š×𝑛
𝑉∗𝐡 =π‘ˆ
Where 𝑉 ∈ 𝑅1×2
𝐡 ∈ 𝑅2×1
π‘ˆ ∈ 𝑅1×1
•
•
•
•
•
•
Introduction
Principal Component Analysis (PCA) Method
Non-negative Matrix Factorization (NMF) Method
PCA-NMF Method
Experiments Result and Analysis
Conclusion
• In this paper, we have detailed PCA and NMF, and applied
them to feature extraction of facial expression images.
• We also try to process basic image matrix and weight matrix of
PCA and make them as the initialization of NMF.
• The experiments demonstrate that the method has got a better
recognition rate than PCA and NMF.
Let’s suppose that m expression images are selected to take part
in training, the training set X is defined by
𝑋 = π‘₯1, π‘₯2 … π‘₯π‘š ∈ 𝑅𝑛×π‘š
(1)
Covariance matrix corresponding to all training samples is
obtained as
π‘š
𝐢=
π‘₯𝑖 − 𝑒 π‘₯𝑖 − 𝑒
𝑇
(2)
𝑖=1
u, average face, is defined by
π‘š
1
𝑒=
π‘₯𝑖
π‘š
𝑖=1
(3)
𝑛 = π‘Ÿπ‘œπ‘€ × π‘π‘œπ‘™
# of training data
π’Ž=πŸ’
π‘π‘œπ‘™
average face
𝒖
+
+
π‘Ÿπ‘œπ‘€
πŸ’
π‘₯1
π‘₯2
π‘₯3
π‘₯4
Training Data Set 𝑋 ∈ 𝑅𝑛×π‘š
𝑒
+
Let
A = π‘₯1 − 𝑒, π‘₯2 − 𝑒 … π‘₯π‘š − 𝑒
Then (2) becomes
𝐢 = 𝐴𝐴𝑇
𝐴 ∈ 𝑅𝑛×π‘š
𝐢 ∈ 𝑅𝑛×𝑛
(4)
(5)
Matrix 𝐢 has 𝑛 eigenvectors
and eigenvalues.
Image
50x50
𝑛 = 50 × 50 = 2500
It is difficult to get 2500 eigenvectors and eigenvalues.
Therefore, we get eigenvectors and eigenvalues of 𝐴𝐴𝑇 by solving
eigenvectors and eigenvalues of 𝐴𝑇 𝐴.
𝐴𝑇 𝐴 ∈ π‘…π‘š×π‘š
The vectors 𝑣𝑖 (𝑖 = 1,2 … π‘Ÿ)and scalars πœ†π‘– (𝑖 = 1,2 … π‘Ÿ)are the
eigenvectors and eigenvalues of covariance matrix 𝐴𝑇 𝐴. Then,
eigenvectors 𝑒𝑖 of 𝐴𝐴𝑇 are defined by
1
𝑒𝑖 =
𝐴𝑣𝑖 𝑖 = 1,2 … π‘Ÿ
6
πœ†π‘–
Sorting πœ†π‘– by size: πœ†1 ≥ πœ†2 ≥ β‹― ≥ πœ†π‘Ÿ > β‹― > 0
Generally, the scale is capacity that 𝑑 eigenvalues occupied:
𝑑
𝑖=1 πœ†π‘–
≥ π›ΌοΌŒusally, 𝛼 is 0.9~0.99.
(7)
π‘Ÿ
𝑖=1 πœ†π‘–
Set π‘Š is a projection matrix,
π‘Š = π‘Š1 , π‘Š2 … π‘Šπ‘‘ .
And then, every facial expression
image feature can be denoted
by following equation
𝑔𝑖 = π‘Š 𝑇 π‘₯𝑖 − 𝑒
(8)
PCA basic images
Given a non-negative matrix 𝑋, the NMF algorithms seek to find
non-negative factors 𝐡 and 𝐻 of 𝑋 ,such that :
𝑋𝑛×π‘š ≈ 𝐡𝑛×π‘Ÿ π»π‘Ÿ×π‘š
(9)
where π‘Ÿ is the number of feature vector satisfies
𝑛×π‘š
π‘Ÿ<
𝑛+π‘š
(10)
Iterative update formulae are given as follow:
𝐡𝑇 𝑋 π‘˜π‘—
π»π‘˜π‘— ← π»π‘˜π‘— 𝑇
𝐡 𝐡𝐻 π‘˜π‘—
𝑋𝐻 𝑇 π‘–π‘˜
π΅π‘–π‘˜ ← π΅π‘–π‘˜
𝐡𝐻𝐻 𝑇 π‘–π‘˜
set 𝐴 = 𝐡𝐻, then define objective function
min 𝑋 − 𝐴 2
(11)
(12)
(13)
And then, every facial expression
image feature can be denoted by
following equation
𝑔𝑖 = π‘Š −1 π‘₯𝑖
(14)
NMF basic images
First, get projective matrix π‘Š and weight matrix 𝐺 by PCA
method.
Initialization is performed for matrices 𝐡 and 𝐻 by following
𝐡 = min 1, abs π‘Šπ‘–π‘˜
(15)
𝐻 = min 1, abs πΊπ‘˜π‘—
(16)
NMF basic images
PCA-NMF basic images
anger
anger
disgust
disgust
fear
fear
happy
neutral
happy
neutral
sad
sad
surprise
surprise
The comparison of recognition rate for every expression
( The training set comprises 70 images and the test set of 70 images)
The comparison of recognition rate for every expression
( The training set comprises 70 images and the test set of 143 images)
The comparison of recognition rate for every expression
( The training set comprises 140 images and the test set of 73 images)
The discussion or r
The results of experiments demonstrate that NMF and PCA-NMF
can outperform PCA. The best recognition rate of facial
expression image is 93.72%. On the whole, our approach
provides good recognition rates.
Download