Diagnosis of Parkinson's Disease Using ANN and SVM Models Y Cuskun1, K Kaplan1, H M Ertunc1 yasincuskun@gmail.com, kaplan.kaplan@kocaeli.edu.tr, hmertunc@kocaeli.edu.tr 1 Department of Mechatronics, Sensor Laboratory, Kocaeli, 41380, TURKEY Abstract Parkinson's is a chronic neurological disorder that occurs in the brain due to the lack of a substance called ‘dopamine’. Individuals with this disease may have symptoms such as movement disorders, postural and balance disorders, change in speech and change in handwriting. In this study, speech changes were considered for disease identification. Then, speech data of diseased and healthy individuals were recorded and 22 speech features were extracted for each patient from these speech data. The extracted features are used to classify the individuals whether they belong to disease or not by means of ANN (Artificial Neural Networks) and SVM (Support Vector Machines) classification methods, which are widely used in machine learning. The purpose of the classification is to identify individuals belonging to Parkinson's disease and provide a decision support mechanism to medical doctors. As a result of the experimental studies, Parkinson's disease was classified with SVM model with 96,146% and ANN model with 94.71% accuracy. Keywords: Diagnose Of Parkinson Disease, Classification, ANN, SVM 2. Introduction Parkinson's disease is a chronic neurological disease that occurs in the brain with the lack of a substance called 'dopamine'. The disease is firstly described by British physician James Parkinson in 1817 and characterized as 'shaking or trembling'. This disorder is a neurological health problem that affects more than 4 million people over the worldwide. It is estimated that 200 thousand people with Parkinson's suffer from this disease in TURKEY [1]. Parkinson's disease is the most common disease after Alzheimer's disease. It can cause movement disorders such as tremor and slowness in movements, muscular rigidity, posture and balance disorders, change in speech and handwriting. It is not possible to remove the symptom with medical treatment, but the symptoms can be taken under control with medications and some surgical interventions. For this reason, in this study, diseased individuals were distinguished from E-ISBN: 978-605-68882-5-0 273 healthy individuals by considering speech disorders seen in patients. In the study, 22 features were extracted using speech data of 195 individuals belonging to diseased and healthy humans. Then, by using ANN and SVM models, widely used in machine learning, diseased individuals were distinguished from healthy individuals with this features. The aim of the study is to help patient people to control the symptoms even if it is not possible to treat the disease [1]. In some studies in the literature, classifications have been made for Parkinson's disease using classification methods such as HEC (Stacked Autoloader) and PNN (probabilistic neural network) [2, 3]. In this study, classification methods such as ANN and SVM, which are commonly used in machine learning for diagnosis of Parkinson's disease and also which usually give successful results, have been used. 3. Theoretical Background 3.1. Artificial Neural Network Artificial neural networks are computer systems that simulate the learning function, which is the most basic feature of the human brain. They perform the learning process with the help of samples. These nets consist of interconnected artificial nerve cells. Each link has a weight value. The information that artificial neural network possesses is stored in these weight values and spread to the network [4]. The learning process takes place when these weight values are updated. Artificial neural networks usually consist of three layers. From these layers, the inputs of the system are kept at the input layer and this information are processed in the hidden layer and transmitted to the output layer. When this process happens, first the model inputs are multiplied by random weights initially, and the value in each neuron of the hidden layer is determined as follows; ππππ = ∑π π=π ππ ∗ ππ + π (1) Where πππ‘π represents the value of each neuron, d is the number of inputs, x is the input, b is the bias term, and w is the weight. After the hidden layer values are calculated, the value of each hidden layer is found with the aid of the activation function. If the activation function is denoted by f as indicated in the following equation; π = π(ππππ ) (2) The activation function may be sigmoid, tangent or hyperbolic tangent in accordance with the structure of the system. After the value of each neuron in the hidden layer is found, the same operations are repeated and this time the model output is found. If the model used is a complex E-ISBN: 978-605-68882-5-0 274 model, the number of hidden layers can be increased. Figure 1 shows an artificial neural network model with single hidden layer. Figure 1. Single layer Artificial Neural Network model. The learning process of the model is usually found by using the backpropagation algorithm. In backpropagation algorithm; the output value is found by using randomly determined weight values at the beginning. Then, using this output value, the input weights are updated as follows. ππππ = ππππ + βw (3) where Δw is the value; βπ½ βππππ = π ∗ βπ€ πππ + πΌ ∗βππππ (4) Here η is called the learning rate and α is called the momentum coefficient. The ΔJ value indicates the difference between the output value and the expected output value. These steps are repeated for each data set to find the most appropriate weight values for the model. In this way, learning of the model is realized [6]. 3.2. Support Vector Machines The classification method with SVM is generally used in binary classifications. In this binary classification, a decision function is used to search the most appropriate hyperplane which can divide the training data. It is aimed to find the maximum distance between the nearest points when the hyperplane is located. As shown in Fig. 2, the hyperplane optimal hyperplane, which limits the boundary to the maximum, and the points that limit the boundary width, are called support vector machines [7]. E-ISBN: 978-605-68882-5-0 275 Figure 2. Optimum hyperplane and support vectors. The hyperplanes of these support vector machines can be found by the following equation. π. ππ + π = ±π (5) Where w is the weight vector (hyperplane normal) and b is the trend value [8]. To maximize the hyperplane bounds ||w|| the expression must be minimal. In this case the following limited optimization problem needs to be solved. π π πππ [ ||π||π ] (6) The limitations related to this are; ππ (π ∗ ππ + π) − π ≥ π and y ∈ {-1,+1} (7) It is expressed in the form [9]. If this optimization problem is solved by Lagrange equations; π π³(π, π, πΆ) = π ||π||π - ∑ππ=π πΆπ ∗ ππ ∗ (π ∗ ππ + π) + ∑ππ=π πΆπ (8) equality is achieved. As a result, for a problem of two classes that can be linearly separated, the decision function can be written as [9]. π(π) = ππππ(∑ππ=π ππ ∗ ππ ∗ (π ∗ ππ ) + π) (9) In some cases, some of the training data can not be separated linearly, but in such cases a problem is solved by talking about a positive artificial variable as in Fig. 3. Control can be done with a C parameter that takes positive values to maximize the hyperplane bounds and minimize the error [10]. E-ISBN: 978-605-68882-5-0 276 Figure 3. Hyperplane identification for non-linear data sets. Optimization problem for nonlinear data classes when regulatory parameter and artificial variable are added; π πππ [π ||π||π + πͺ ∗ ∑ππ=π ππ ] (10) ππ (π ∗ π(ππ ) + π) − π ≥ π − ππ (11) The limitations related to this are; ππ ≥ π and i=1,…,N In order to solve the optimization problem expressed in Eq. 11 and 12, the data which can not be linearly separated in the input space, as shown in Fig. 4, is moved to a higher dimensional space and linearly separated to determine the hyperplane. Figure 4. Converting the data to a higher size with the kernel function. While SVM is mathematically modelled, it can be classified as linear by using a kernel function expressed as πΎ(π₯π , π₯π ) = π(π₯) ∗ π(π₯π ). In this case, it can be written as follows [10]. π(π) = ππππ(∑π πΆπ ∗ ππ ∗ π(π) ∗ π(ππ ) + π ) E-ISBN: 978-605-68882-5-0 (12) 277 4. Method Speech data from Parkinson's patients was recorded and 22 features were extracted from these data. These features are obtained from the UCI database [11]. Some of these features are namely average vocal fundamental frequency, maximum vocal fundamental frequency, minimum vocal fundamental frequency, several measures of variation in fundamental frequency and several measures of variation in amplitude. A total of 195x23 matrix was obtained after these properties were obtained. Approximately 80% of the data sets consisting of diseased and healthy individuals for a reliable classification were equally selected as the training data (156) and the remaining 20% as the test data (39). After the data sets were created, training and test data sets were obtained by mixing. Then, the datasets are classified by running a program in MATLAB environment. 4.1. Artificial Neural Network An ANN model with two hidden layers was created, then training and test data sets were randomly determined. The constructed ANN model for this study is given in Fig. 5. Firstly, with the forward propagation algorithm, the output value is found by randomly determined weight values at first, then the error between the output is found and the expected output is found. This error is used in backpropagation algorithm. In backpropagation algorithm; the error value is differentiated according to the weight values in each layer in order. This derivative value is multiplied by the learning rate and the found weight values are updated. Figure 5. ANN model used in the classification process. The parameters used in the ANN model are given in Table 1. These values are intuitively found. E-ISBN: 978-605-68882-5-0 278 Table 1. ANN parameters. ANN Parameters Value η (Learning Rate) 0,8 α (Momentum Coefficient) 0,4 Number Of Neurons In First Hidden Layer 11 Number Of Neurons In Second Hidden Layer 5 Activation Function Sigmoid Since ANN is a predictive method, the output value is limited so that diseased and healthy individuals are identified. The restriction operation is performed according to the activation function. Since sigmoid is used as the activation function, this value is set to 0.5. 4.2. Support Vector Machines By using the same data set that was employed in ANN model, classification process is realized again based on SVM method. To this end, a non-linear classification is performed by SVM model. The kernel function (Radial Basis Function) is shown as in the Eq. 13. (π) (π) π²(π , π ) = π ||ππ −ππ ||π πππ (13) Tablo 2. SVM parameters. SVM Parameters Value C 1 Kernel Function RBF σ 1 The parameters used in the SVM model are determined by empirical and are given in Table 2. 5. Results and Discussions The test set error formula used in the results is given in Eq. 14, Precision and Recall values for the f1 score accuracy rate are given in Eq. 15 and f1 score accuracy ratios are given in Equation 16. E-ISBN: 978-605-68882-5-0 279 ∑π΅ π=π (πΆπ −πΆπ )π π΅ (14) where N is the number of data in the test set πΆπ is the target output value, and πΆπ is the model output. π·(π·ππππππππ) = π»π· π»π·+ππ· πΉ (πΉπππππ) = π»π· π»π·+ππ΅ (15) where TP is the correctly estimated number of Parkinson individuals, FP is the number of incorrectly estimated healthy individuals, and FN is the incorrectly estimated number of Parkinson individuals. Precision (P) is the value that indicates the ratio of accurately predicted positive observations to total estimated positive observations. Recall (R) value determines the ratio of accurately predicted positive observations to all observations in the actual class - yes. ππ πππππ = π∗π·∗πΉ π·+πΉ (16) The obtained values are shown in Table 3 for 5-fold cross validation error and accuracy; Table 3. 5-fold cross validation values and averages. 5-fold cross validation ANN SVM Test Error f1 score (%) Test Error f1 score (%) 1. Trial 0.0980 93.33 0.0862 94.74 2. Trial 0.0483 96.67 0.0517 96.84 3. Trial 0.1204 91.80 0.1034 93.48 4. Trial 0.0739 95.08 0.0172 98.9 5. Trial 0.0549 96.67 0.0517 96.77 Average 0.0791 94.71 0.06204 96.146 As seen in the Table 3, the average test error of ANN model is 0.0791 and average f1 score is 94.71% while the average test error rate is 0.06204 and average f1 score is 96.146 for SVM classification. When the values obtained from the models are examined, it can be seen that classification success rate by SVM is more than that of ANN model. The classification results with ANN are shown in Fig. 6 and the classification results with SVM are shown in Fig. 7, respectively. E-ISBN: 978-605-68882-5-0 280 Figure 6. Classification results with ANN. Figure 7. Classification results with SVM. 6. Conclusions In this study, 22 features were extracted from the speech data from Parkinson and healthy individuals and classified according to ANN and SVM classification methods. As shown in Table 3, when the SVM and ANN classifications were compared, the accuracy of classification with SVM was 96.146% and the accuracy with ANN was 94.71%. When the results were compared, it was observed that SVM classification was more successful. As a result of this E-ISBN: 978-605-68882-5-0 281 studies, the decision-support model obtained by the SVM method will help doctors to make decisions for the diagnosis of Parkinson's disease. 7. Acknowledgment This work was performed at Sensor Laboratory in Department of Mechatronics Engineering in Kocaeli University. References [1] Δ°senkul M E, 2011 Parkinson HastalΔ±ΔΔ±nΔ±n TeΕhisi Δ°çin Veri Toplama ve Örüntü TanΔ±ma Sistemi [2] Badem H, ÇalΔ±Εkan A, BaΕtürk A and Yüksel M E 2016 Electrical-Electronics and Biomedical Engineering Conference YΔ±ΔΔ±nlanmΔ±Ε Özdevinimli KodlayΔ±cΔ± ile Parkinson HastalΔ±ΔΔ±nΔ±n SΔ±nΔ±flandΔ±rΔ±lmasΔ± ve TeΕhis Edilmesi p 1 [3] BarΔ±ΕçΔ±l M S, Çetin O, Er O and DemirtaΕ F, 2012 Electric Letters on Science & Engineering OlasΔ±lΔ±ksal Sinir AΔΔ±nΔ±n (PNN) Parkinson HastalΔ±ΔΔ±nΔ±n TeΕhisinde KullanΔ±lmasΔ± p 1 [4] Öztemel E 2012 Papatya Publishing and Education Yapay Sinir AΔlarΔ± [5] Bayram S, Kaplan K, Kuncan M, Ertunç H M 2013 Turkish National Committee for Automatic Control Bilyeli Rulmanlarda Zaman UzayΔ±nda Δ°statistiksel Öznitelik ÇΔ±karΔ±mΔ± ve Yapay Sinir AΔlarΔ± Metodu ile Hata Boyutunun Kestirimi p 986 [6] Duda R O, Hart P E and Stork D G 2000 New York: A Wiley Interscience Publication Pattern Classification [7] Vapnik V N 2000 New York: Springer Sicence & Busines Media New York The Nature of Statistical Learning Theory [8] KavzoΔlu T and Çölkesen Δ° 2010 Harita Dergisi Destek Vektör Makineleri Δ°le Uydu Görüntülerinin SΔ±nΔ±flandΔ±rΔ±lmasΔ±nda Kernel FonksiyonlarΔ±nΔ±n Etkilerinin Δ°ncelenmesi p 73 [9] Osuna E E, Freund R and Girosi F 1997 Massachusetts Institute of Technology and Artificial Intelligence Laboratory Support Vector Machines: Training and Applications p 144 [10] Cortes C and Vapnik V 1995 Kluwer Academic Publishers Support Vector Networks p 273 [11] Lichman, M 2013 CA: University of California, School of Information and Computer Science UCI Machine Learning Repository [http://archive.ics.uci.edu/ml] E-ISBN: 978-605-68882-5-0 282