THE HONG KONG POLYTECHNIC UNIVERSITY Department of Electronic & Information Engineering The Hong Kong Polytechnic University EIE520 Neural Computation Lab 2: 2-D Vowel Recognition using Kernel-Based Neural Networks A. Introduction: There are many methods to perform pattern classification using neural networks. Among these methods, Gaussian Mixture Models (GMMs) and Radial Basis Function (RBF) networks are two of the most promising neural models for. In this laboratory exercise, your task is to develop pattern classification systems based on GMMs and RBF networks. The systems should be able to recognize 10 vowels. You will use “netlab” [5] together with Matlab to create the GMMs and RBF networks. B. Objectives: You should complete the following tasks at the end of this laboratory exercise: 1. Create GMMs and RBF networks to represent 10 different classes. 2. Perform pattern classification using the created networks. 3. Compare the GMM-based system against the RBF network-based system using different in terms of recognition accuracy, decision boundaries, training time, and recognition time by varying the numbers of kernels. 4. Find the decision boundaries and plot them on a 2-D plan. C. Background: C.1 GMM-Based Classifier mwmak03/eie520/lab2 17/02/2016 1 K k arg max p( x t | i ) i 1 MAXNET p(x t | 1 ) p ( x t | K ) P(1|K | K ) P ( R|1 | 1 ) P(1|1 | 1 ) p ( x t | 1 , 1|1 ) …. …. p ( x t | 1 , r|1 ) P ( R| K | K ) P ( r | K | K ) P( r|1 | 1 ) …. p( x t | K , r|K ) p( x t | 1 , R|1 ) p(x t | K , R|K ) p( x t | K , 1|K ) x t (D-dimensional vectors) Figure 1. Architecture of a GMM-based classifier Figure 1 depicts the architecture of a K-class classifier in which each class is represented by a Gaussian mixture model (GMM). GMMs make use of semi-parametric techniques for approximating probability density functions (pdf). The output of a GMM is the weighted sum of R component densities, as shown in Figure 1. Given a set of N independent and identically distributed patterns X (i ) xt ; t 1,2,..., N associated with class i , we assume that the class likelihood function p(x t | i ) is a mixture of Gaussian distributions, i.e., R p( x t | i ) P( r|i | i ) p( x t | i , r|i ) (1) r 1 where r|i represents the parameters of the r-th mixture component, R is the total number of mixture components, p( x t | i , r|i ) ( r|i , r|i ) is the probability density function of the r-th component and P ( r|i | i ) is the prior probability (also mwmak03/eie520/lab2 17/02/2016 2 called mixture coefficients) of the r-th component. Typically, ( r|i , r|i ) is a Gaussian distribution with mean vectors r|i r 1 , covariance matrices r|i r 1 , and R R mixture coefficients P(r|i | i )r 1 are typically estimated by the EM algorithm. More specially, the parameters of a GMM are estimated iteratively by R N ( j 1) r|i P ( j) t 1 N P ( r|i | x t )x t , ( j) t 1 N (r|ij 1) P ( j) t 1 ( r|i | x t ) ( r|i | x t )[ x t (r|ij 1) ][ x t (r|ij 1) ]T , and N P ( j) t 1 N P ( j 1) ( r|i ) P ( j) t 1 (2) ( r|i | x t ) ( r|i | x t ) r 1, , R N where j denotes the iteration index, P ( j ) (r|i | x t ) is the posterior probability of the r-th mixture ( r 1, , R ) , and T denotes matrix transpose. The posterior probability can be obtained by Bayes’ theorem, yielding P ( j ) ( r|i | x t ) P ( j ) ( r|i ) p ( j ) ( x t | r|i ) R P ( j) k 1 (3) ( k |i ) p ( x t | k |i ) ( j) in which p ( j ) ( x t | r|i ) 1 D 2 ( 2 ) | 1 ( j) 2 r|i | T 1 exp x t (r|ij ) ( (r|ij ) ) 1 x t (r|ij ) 2 (4) where D is the input dimension. C.2 RBF Network-Based Classifier In multi-layer perceptrons, the hidden neurons are based on linear basis function (LBF) nodes. Another type of hidden neurons is the radial basis function (RBF) neurons, which is the building block of the RBF neural networks. In an RBF network, each neuron in the hidden layer is composed of a radial basis function that also serves as an activation function. The weighting parameters in an RBF network are the centres and the widths of these neurons. The output functions are the linear combination of these radial basis functions. A more general form of the RBF networks is the elliptical basis function (EBF) networks where the hidden neurons mwmak03/eie520/lab2 17/02/2016 3 compute the Mahalanobis distance between the centres and the input vectors. It has been shown that RBF networks have the same asymptotic approximation power as multi-layer perceptrons. y2 (x(t )) y1 (x(t )) 10 1 11 1M ……. K1 K 2 12 K 0 1 KM Group K Group 1 1 (x(t )) 2 (x(t )) ….. M 1 (x(t )) M (x(t )) x(t )(D-dimensional vectors) Figure 2. Architecture of a K-output EBF network To apply RBF/EBF networks for pattern classification, each class is assigned a group of hidden units, and each group is trained independently using the data from the corresponding class. Figure 2 depicts the architecture of an RBF/EBF network with D inputs, M basis functions (hidden nodes), and K outputs. The input layer distributes the D-dimensional input patterns, xt, to the hidden layer. Each hidden unit is a Gaussian basis function of the form 1 j 1,..., M j ( x t ) exp ( x t j ) T j1 ( x t j ) (5) 2 j where j and j are the mean vector and covariance matrix of the j-th basis function respectively, and j is a smoothing parameter controlling the spread of the j-th basis function. The k-th output is a linear weighted sum of the basis functions’ output, i.e., mwmak03/eie520/lab2 17/02/2016 4 M y k (x t ) k 0 kj j (x t ) t 1, , N and k 1,, K (6) j 1 where x t is the t-th input vector and k 0 is a bias term. In matrix form, (6) can be written as W where is an N K matrix, an N ( M 1) matrix, and W is an ( M 1) K matrix. The weight matrix W is the least squares solution of the matrix equation W D (7) where D is an N K target matrix containing the desired output vectors in the rows. As is not a square matrix, one reliable way to solve (7) is to use the technique of singular value decomposition. In this approach, the matrix is decomposed into the product UΛ VT, where U is an N ( M 1) column-orthogonal matrix, is an ( M 1) ( M 1) diagonal matrix containing the singular values, and V is an ( M 1) ( M 1) orthogonal matrix. The weight vectors w k k 1 are given by K w k V1 U T d k (8) where d k is the k-th column of D. For an over-determined system, singular value decomposition gives a solution that is the best approximation in the least squares sense. D. Procedures: D. 1 GMM-Based Classifier 1) Download the “netlab” software from http://www.ncrg.aston.ac.uk/netlab/ and save the m-files in your working directory. Download the training data and testing data from M.W. Mak’s home page http://www.eie.polyu.edu.hk/~mwmak/SA/2DVowel.zip. This is the vowel data you will use in this laboratory exercise. 2) Open Matlab, go to “File” -> “Set Path” and add the directory where “netlab” was saved. 3) Import and save the training data, 2DVowel_train.pat, to a 2D array. The imported matrix should be 33812 in size. The first 2 columns contain training feature vectors in a 2-D input space and the 3rd to 12th columns indicate the class to which each pattern belongs. 4) After importing the data, you create and initialise the GMMs by using the function gmm and gmminit respectively. Set the number of centers to 3 and covariance type to “Diagonal” first. The model can be created by using model_name=gmm(data dimension, no of centres, covariance type) and the model can be initialized by mwmak03/eie520/lab2 17/02/2016 5 model_name=gmminit(model_name, data, options) 5) Then, you use the EM algorithm gmmem to train the models. An example program, creation_sample.m is provided to demonstrate this training process. 6) Create 10 GMMs to represent the 10 vowels using the data from 10 different classes. It is recommended to separate the data into 10 files. 7) Plot the imported training data together with the centres after EM training. 8) Now, import the testing data, 2DVowel_test.pat. This file is for you to test the classification rate of the GMMs you have just created. The file contains 333 data points, and again each point belongs to one of the 10 classes. For a given data point, the probability that it belongs to a particular class can be calculated by the function gmmprob. Each data point is classified to the class whose corresponding likelihood is the highest. The overall classification rate is calculated by: Number of correctly classified points Total number of data points 100% 9) Now, try different numbers of centres and different covariance types (Diagonal and Full) when creating the models. Find the optimal combination that gives the highest classification rate. What is the optimal combination and what is the classification rate? 10) Suppose you have already obtained the optimal number of centres and covariance type for the models. Now you should start with finding the decision boundaries. The decision boundaries separate the 10 classes in a 2-D space. You can do this by finding the class difference in the x-y plan and a sample program, decision_boundary_sample.m, is included in this exercise to demonstrate the procedure of finding the boundaries. What happen at the edge of the x-y plan? Explain the phenomenon. D.2 RBF network-Based Classifier 1. In this part, you will repeat D.1 but using RBF networks. Again, you should start with importing the training data to a 33812 array. 2. After imported the data, you should separate it into 2 parts: one is the data part, which is 3382 in size, and the other one is the class ID, which is 33810 in size. 3. Instead of creating 10 different RBF networks as Part I, you should create one RBF network. To create an RBF network, you use the function rbf. In order to specify the network architecture, you must provide the number of inputs, the number of hidden units, and the number of output units. mwmak03/eie520/lab2 17/02/2016 6 4. After that, you initialise the RBF network by calling the function rbfsetbf. You need to specify a number of option fields as in gmm in Part D.1. Before performing classification, call the function rbftrain to train the RBF network you have just created. You have to specify a target vector, which contain the classing information. At this stage, all the training processes should have finished and the next step is to do classification. 5. Now, import the testing data and use the function rbffwd to perform classification. This function has 2 input fields, one is the RBF network that will be used for classification and the other one is a row vector. In this exercise, this row vector should have 2 fields: the x location and y location. The output is again a row vector, the size will be equal to the number of outputs that you specify in Step 3. For each test vector, the class ID is determined by selecting the netlab output whose response to the test vector is the highest. 6. Compute the classification rate of the whole testing set data. Try different number of hidden units and select the optimal one. What is the optimal number of hidden units and what is the corresponding classification rate? Compare and explain the classification performance of the RBF networks with that of the GMMs. 7. After successfully select an optimal number of hidden units, you can plot the decision boundaries. Again, it can be done by finding the class difference in the x-y plan. Log down your result and compare the whole process with that of the GMMs in terms of training time and recognition processing time. E. References: 1. http://www.ncrg.aston.ac.uk/netlab/ 2. C. M. Bishop, Neural Networks for Pattern Recognition, Oxford Press, 1995. 3. K.K. Yiu, M.W. Mak and C.K. Li, "Gaussian Mixture Models and Probabilistic Decision-Based Neural Networks for Pattern Classification: A Comparative Study," Neural Computing and Applications, Vol. 8, pp. 235-245, 1999. 4. M.W. Mak and S.Y. Kung. "Estimation of Elliptical Basis Function Parameters by the EM Algorithms with Application to Speaker Verification," IEEE Trans. on Neural Networks, Vol. 11, No. 4, pp. 961-969, July 2000. 5. Ian T. Nabney, Netlab Algorithms for Pattern Recognition, Springer, 2002. mwmak03/eie520/lab2 17/02/2016 7