Part 6c - Joensuu

advertisement
Principal Component Analysis
Jana Ludolph
Martin Pokorny
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
PCA overview
• Method objectives
•
Data dimensionality reduction
•
Clustering
• Extract variables which properties are constitutive
• Dimension reduction with minimal loss of information
• History:
– Pearson 1901
– Established 1930 Harold Hotelling
– Since 1970 actually used (high perfomance computer)
• Application:
– Face recognition
– Image processing
– Artificial intelligence (neural network)
• This material is PPT form of [1] with some changes
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Statistical background (1/3)
• Variance
– Measure of the spread of data in a data set
– Example:
Data set 1 = [0, 8, 12, 20], Mean = 10, Variance = 52
Data set 2 = [8, 9, 11, 12], Mean = 10, Variance = 2.5
1 n
s   ( X i X ) 2
n i 1
2
Also version with (n-1)
• Standard deviation
– Square root of the variance
– Example:
25
20
Data set 1, Std. deviation = 7.21
Data set 2, Std. deviation = 1.58
15
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
10
1 n
s 
( X i X ) 2

n i 1
Also version with (n-1)
5
0
0
0,5
1
1,5
2
Data set 1
2,5
3
Dat set 2
3,5
4
4,5
Statistical background (2/3)
• Covariance
–
–
–
–
Variance and standard deviation operate on 1 dimension, independently of
the other dimensions
Covariance: similar measure to find out how much the dimensions vary from
the mean with respect to each other
Covariance measured between 2 dimensions
Covariance between X and Y dimensions:
1 n
cov( X , Y )   ( X i  X )(Yi  Y )
n i 1
–
–
Also version with (n-1)
Result: value is not as important as its sign
(+/− see examples below, 0 – two dimensions are independent of each other)
Covariance between one dimension and itself: cov(X, X) = variance(X)
Student example, cov = +4.4
Sport example, cov = −140
The more study hours, the higer grading
The more training days, the lower weight
6
100
5
90
4
80
3
70
2
60
1
50
0
grading
0
1
2
3
4
5
6
7
8
9
10
40
11
weight
study hours
0
10
20
training days
30
40
50
60
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Statistical background (3/3)
•
Covariance matrix
–
All possible covariance values between all the dimensions
–
Matrix for X, Y, Z dimensions:
 cov( X , X ) cov( X , Y ) cov( X , Z ) 


C   cov(Y , X ) cov(Y , Y ) cov(Y , Z ) 
 cov( Z , X ) cov( Z , Y ) cov( Z , Z ) 


–
Matrix properties
1) Number of dimensions is n. Then the matrix is n x n.
2) Down the main diagonal covariance value is between one of
the dimensions and itself – variance of that dimension.
3) cov(A, B) = cov(B, A), the matrix is symmetrical about the main
diagonal.
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Matrix algebra background (1/2)
• Eigenvectors and eigenvalues
– Example of eigenvector
 2 3   3  12 
 3

        4   
2
1

  2  8 
 2
eigenvector
Example of non-eigenvector
 2 3   1  11

      
2
1

  3  5 
eigenvalue associated with the eigenvector
– 1st example: the resulting vector is an integer multiple of the original
vector
– 2nd example: the resulting vector is not an integer multiple of the
original vector
– Eigenvector (3,2) represents an arrow pointing from the origin (0,0)
to the point (3,2)
– The square matrix is the transformation matrix, resulting vector is
transformed from its original position
– How to obtain the eigenvectors and eigenvalues easily
Use some math library, for example Mathlab:
[V, D] = eig(B); V: eigenvectors, D: eigenvalues, B: square matrix
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Matrix algebra background (2/2)
• Eigenvectors properties:
–
–
–
–
Can be found only for square matrices
Not every square matrix has eigenvectors
n x n matrix that does have eigenvectors, there are n of them
Eigenvector scaled before the multiplication, the same multiple of it as
a result
 3  6
2      
 2  4
 2 3   6   24 
 6

        4   
 2 1   4   16 
 4
– All the eigenvectors of a matrix are perpendicular, ie. at right
angles to each other, no matter how many dimensions there are
Important because it means the data can be expressed in terms of
the perpendicular eigenvectors, instead of expressing them in terms
of the x and y axes
– Standard eigenvector – eigenvector whose length is 1
 3
 
 2
vector
32  22  13
vector length
 3 / 13 
 3

   13  

 2
 2 / 13 
standard vector
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Using PCA in divisive clustering
1) Calculate the principal axis
•
Choose the eigenvector with the highest eigenvalue of the
covariance matrix.
2) Select the dividing point along the principal
axis
•
Try each vector as dividing and select the one with the
lowest distortion.
3) Divide the vectors according to a hyperplane
4) Calculate the centroids of the two sub clusters
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
PCA Example (1/5)
1) Calculate Principal Component
Step 1.1: Get some Data
7
6
5
4
3
Step 1.2: Subtract the mean
2
1
x = 4.17 y= 3.83
Point
X
Y
X–X
A
1
1
-3.17
B
2
1
0
0
2
4
6
8
10
Y-Y
3
-2.83
2
-2.17
-0.17
-2.83
C
4
5
D
5
5
0.83
1.17
E
5
6
0.83
2.17
3.83
1.17
F
8
5
1
1.17
0
-4
-3
-2
-1
-1
-2
-3
-4
0
1
2
3
4
5
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
PCA Example (2/5)
Step 1.3: Covariance matrix calculation
 5.139 3.694  Positive covij values

C  
3
.
694
4
.
139

 → x and y values increase together in dataset
Step 1.4: Eigenvectors and eigenvalues calculation –Principal axis
a) Calculate eigenvalues λ of matrix C
 5.139  
C    E  
 3.694
3.694 

4.139   
Where E is identity matrix
The characteristic polynom is the determinant. The roots of the function,
that appears if you set the polynom equals zero, are the eigenvalues
det C    E   5.139   4.139     3.694
2
 2  9.278  7.620

1  8.367
2  0.911
Note: For bigger matrices (when original data has more than 3 dimensions),
the calculation of eigenvalues gets harder. Choose for example POWERmethod to solve. [4]
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
PCA Example (3/5)
b) Calculate eigenvectors v1 and v2 out of eigenvalues λ1 and λ2 via
properties of eigenvectors (see matrix algebra background(3/3))
x 
 5.139 3.694  x1 

   8.367 1 
 3.694 4.139  y1 
 y1 
x 
 5.139 3.694  x2 

   0.911 2 
 3.694 4.139  y2 
 y2 
x
 v1   1     0.753 
 y    0.658 

 1 

 x   0.658 

v2   2   
y

0
.
753

 2 
3
v1
2
1
0
-4
-3
-2
-1
0
1
2
-1
-2
-3
-4
v2
3
4
5
Eigenvector v1 with
highest eigenvalue
fits the best. This is
our principal
component
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
PCA Example (4/5)
2) Select the dividig point along the principal axis
Step 2.1: Calculate
projections on principal
axis
3
2
1
0
-4
-3
-2
-1
0
1
2
3
4
5
-1
-2
-3
2.2 Sort according
to their projections
-4
1
0
0
-1
1
2
3
4
5
6
7
8
9
10
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
PCA Example (5/5)
Step 2.3: Try each vector as dividing point and calculate
the distortion, choose the lowest
Dividing point A
Dividing point B
3
3
2
2
1
1
0
0
-4
-3
-2
-1
0
1
2
3
4
5
-4
-3
-2
0.25
-1
-1
0
1
2
-2
-3
-3
-4
-4
data point projection
D1 = 0.25
D1 = 5.11
centroid
D2 = 2.44
D2 = 2.67
hyperplane perpendicular
to principal component
D = D1 + D2 = 2.69
D = D1 + D2 = 7.78
dividing point
clusters
4
5
-1
-2
<
3
Take A as dividing point.
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
References
[1] Smith I L.: A tutorial on Principal Components Analysis. Student tutorial.
2002.
http://csnet.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf.
[2] http://de.wikipedia.org/wiki/Hauptkomponentenanalyse
[3] http://de.wikipedia.org/wiki/Eigenvektor
[4] R.L. Burden and J.D. Faires, Numerical Analysis (third edition). Prindle,
Weber & Smith, Boston, 1985. (p. 457)
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Download